Saturday, September 11, 2010

Flat file schemas, delimeter characters, wrap characters and escape characters explained

A question on the BizTalk Professionals group on LinkedIn caused me to write a short answer, but I thought I'd do a more comprehensive take on it here.

The question was: what is the difference between wrap characters and escape characters?

When parsing a flat file schema, delimiter characters are used in order to split the incoming data into separate entities. Let's say we have the following data:

Alpha,Beta,Gamma,Delta,Epsilon

In this case, comma (,) is used as a delimeter which will enable us to split the string into the five separate words we want.

However, if it were to be a list of numbers with decimals and we use comma as the decimal separator as we do in Europe, using comma as a delimeter would be tricky since we don't know whether to split the string on the comma, or use it as a separator. In this case, we can use wrap characters.

"2,25","1,14","5,34"

In this example, the quote character (") is used as a wrap character, i.e. it wraps the separate entities. These are in turn separated with the delimiter character which is a comma (,). This will make us use the delimiter character as part of our data.

The same can be pulled off using escape characters. An escape character is placed before an otherwise reserved character in order to not parse it but to use it as part of the data. Most common is to have backslash (\) as the escape character due to it's use as such in many programming languages.

2\,25,1\,14,5\,34

The above line will give a similar result as the one with wrapped entities if backslash (\) is defined as an escape character. It escapes the following comma (,) which then will not be parsed even if it is defined as the delimiter and so it will be used as part of the data instead.

1 comment: