data/XMLData/rawdata/binary


#1

Hello FIX Community,

In the specifications we see fields with underlying FIX data type “data” which in turn in the FIX data types is documented as having base type “string”.

However, data is also referred to elsewhere as “raw” data.

Is the intent that the content of such fields is seen as “binary”, or that if “binary” data is to be transmitted, it should be suitably encoded in a scheme that can survive string conversion (e.g. MIME, HEX, Base64).

If “data” was to be survivable across conversions between systems, handlers would explicitly have to skip the field content during processing. We feel this is an unlikely intention with the more likely intention that “data” be represented in programs using String or Character with a requirement that any binary content is suitably encoded to make it survivable across character conversions.

If someone has an opinion on the correct interpretation, I would be grateful to hear it (TIA).

In particular, XMLData would be seen as character (it defines its own encoding), and within if there is binary data it has to be encoded to make it survivable across conversions.

-CC.


#2

On review of usage of the types it seems any impementation should treat both ‘data’ and ‘XMLData’ types as binary during transmission. Any encoded fields have encoding specified and any XMLData fields are responsible for self-identifying the encoding of the XML data within the XML data stream. A processor needs to ensure that such elements are processed losslessly during handling. The end receipient of encoded data is responsible for ensuring correct handling with the provided encoding value. The XML parser used to handle XMLData is likewise responsible for the correct interpretation of the data,


#3

I agree that the current specification for “data” conflates two datatypes–opaque binary data and character data in a non-ASCII character set. XML is an example of the latter case, and saying that String is its base type makes some sense. However, String is not the base of binary data.

ISO 11404 General Purpose Datatypes makes a distinction between octetstring (variable-length encodings using 8-bit codes) and characterstring (a family of datatypes which represent strings of symbols from standard character-sets).

XML is a special case of characterstring, usually with UTF-8 encoding.

The definitions of datatypes in FIX specifications are about 20 years old and are tightly bound to tagvalue encoding syntax. I have recommended that they be reviewed for future specification revisions, especially in light of binary encodings such as Simple Binary Encoding (SBE).


#4

Thanks for the update Donald.


#5

On another note - in the past it has been strongly recommended to us to adopt Base64 encoding for XML and other payloads employing the DATA datatype field. The MessageEncoding field (http://fiximate.fixtrading.org/en/FIX.5.0SP2/tag347.html) was added to the StandardHeader in order to specify the encoding used in the Encoded fields. A bit more formality and guidance is warranted in this area. We should take this up as part of the ISO FIX initiative that will incorporate Don’s work and formalization based upon ISO 11404 General Purpose Datatypes (https://www.iso.org/standard/39479.html and https://en.wikipedia.org/wiki/ISO/IEC_11404 and https://standards.iso.org/ittf/PubliclyAvailableStandards/c039479_ISO_IEC_11404_2007(E).zip )

A lesson to standards creators avoid the word “DATA” it has no meaning.