Friday, 13 January 2012

Endianness

Endianness. It's a topic I come back to again and again as people hear about it and ask questions with concern (due to the mysterious nature of this issue) when they are looking at hetergeneous environments for their SAP systems.
At first I really thought it was purely OS related - i.e. how the OS kernel reads bytes - either
Big Endian (biggest byte in a "word" read first)
Little Endian (smallest byte in a "word" read first)
(there is middle endian too but lets not even go there - you need more in-depth reading for that)

However the direct relation to OS seems wrong to me on a second read. It seems it is actually related how how the hardware architecture is setup to store data. To quote Wikipedia (which I'm hoping is right) "Endianness is a difference in data representation at the hardware level and may or may not be transparent at higher levels, depending on factors such as the type of high level language used."

So when is this an issue? Well in context if you moved some of your SAP application servers from a Big Endian platform to a little endian platform (but left the CI on a different type). Then you could potentially have interface issues at the OS level. Especially it seems if you have files classed as "Text files" which have integers or floating point numbers in them. This type of data should ONLY be held in binary files (see SAP note 65050 "Data types and file formats in files (DATASET)"). Continuing off on a slight tangent - what exactly is a binary file? Actually this is something we in the IT industy inherently believe we know, but can we accurately define it? If you can that will help you to understand endianness. I'm part of the way there - here are some notes

"Many binary file formats contain parts that can be interpreted as text"

"binary files that contain only textual data—without, for example, any formatting information—are called plain text files"

"Unlike Text files, there is no special character present in the binary mode files to mark End-of-file. The binary mode files keep track of the end-of-file from number of characters present in the dictionary entry of the file."

"Binary files typically contain bytes that are intended to be interpreted as something other than text characters."

"Some binary files contain headers, blocks of metadata used by a computer program to interpret the data in the file. For example, a GIF file can contain multiple images, and headers are used to identify and describe each block of image data. If a binary file does not contain any headers, it may be called a flat binary file. But the presence of headers are also common in plain text files, like email and html files."

Thanks to Wikipedia as always for being a brilliant resource for brushing up quickly on this stuff. http://en.wikipedia.org/wiki/Binary_file.

I find having read up a bit I feel I have almost got a grasp on exactly what constitutes a "binary file" and that is a step to understanding Endianness and it's implications on hetergeneous environments...

No comments:

Post a Comment