[PC20020211_1] Modification of encoding of binary fields

Imported from previous forum

[ original email was from Kevin Houstoun - kevin.houstoun@ssmb.com ]
The following proposal was considered by the FIX technical committee as a potentail change for FIX.4.4. It was defered for consideration after the release of FIX.4.4. Comments are invited on it as a proposed change for a later release.

=================================================

The way that raw data fields are handled and validated in FIX today is somewhat cumbersome. The engine must look for specific, special tags that tell the engine how to parse the next field. This means that, to receive a message with binary data fields, the engine must have a mapping of pairs for the tag numbers of binary fields and their corresponding length fields. This has two adverse effects:

  1. In upgrading an engine to support newer FIX versions, it is trivial to forget to add the new binary fields introduced in that version. If someone sends you a message with such a field, you probably would be OK if that field doesn’t contain SOH, it probably would pass user acceptance testing, go into production, and explode the first time someone sends an SOH in that field.
  2. Adding user-defined binary fields is extremely dangerous, as doing so requires all of one’s counterparties to modify their engines to recognize the data and length tags. This is significantly different from adding, say, a string or int user-defined field, which is relatively safe.
    I suggest that we make binary fields self-describing by changing the formatting rules. Eliminate all length fields that describe binary fields,and embed “+length” to the tag prior to the “=”. For example:
    9100+12=BinaryField[SOH]
    9101+23=Embedded[SOH]Delimiter[SOH]Test[SOH]
    9102=Normal Field[SOH]
    Any engine receiving this message knows immediately that 9100 and 9101 are binary fields, can parse them accordingly, and can handle the embedded [SOH]'s in 9101 just fine, even if the engine does not know in advance that these are binary fields.
    Note that under this proposal binary fields MUST be length encoded in this manner; and non-binary fields MUST NOT be length-encoded. Length encoding a field defined in the spec as something other than binary can, at the receiver’s discretion, result in a session-level reject.
    Thoughts? I understand that to send a binary field, one must know that the field is, indeed, binary. This just removes the requirement on the receiving end. Is it worth the change to the message formatting to do this in the next spec version?

[ original email was from Mikhail Jirnov - mikhail.jirnov@ubsw.com ]
Good proposal. Binary fields encoding with length/data tag pairs always looked pretty awkward to me. This is also quite unfriendly to non-ASCII encodings. Alternatively the problem could be fixed with a sort of "escaping" of the SOH should it happen in a field value. For example a sequence of two SOHs could be generally used to encode SOH value. This technique is used pretty often when some character has a special meaning. In case of FIX it could be worse than the proposed solution but probably still worth consideration.

> The following proposal was considered by the FIX technical committee as a potentail change for FIX.4.4. It was defered for consideration after the release of FIX.4.4. Comments are invited on it as a proposed change for a later release.
>
> =================================================
>
> The way that raw data fields are handled and validated in FIX today is somewhat cumbersome. The engine must look for specific, special tags that tell the engine how to parse the next field. This means that, to receive a message with binary data fields, the engine must have a mapping of pairs for the tag numbers of binary fields and their corresponding length fields. This has two adverse effects:
> 1. In upgrading an engine to support newer FIX versions, it is trivial to forget to add the new binary fields introduced in that version. If someone sends you a message with such a field, you probably would be OK if that field doesn’t contain SOH, it probably would pass user acceptance testing, go into production, and explode the first time someone sends an SOH in that field.
> 2. Adding user-defined binary fields is extremely dangerous, as doing so requires all of one’s counterparties to modify their engines to recognize the data and length tags. This is significantly different from adding, say, a string or int user-defined field, which is relatively safe.
> I suggest that we make binary fields self-describing by changing the formatting rules. Eliminate all length fields that describe binary fields,and embed “+length” to the tag prior to the “=”. For example:
> 9100+12=BinaryField[SOH]
> 9101+23=Embedded[SOH]Delimiter[SOH]Test[SOH]
> 9102=Normal Field[SOH]
> Any engine receiving this message knows immediately that 9100 and 9101 are binary fields, can parse them accordingly, and can handle the embedded [SOH]'s in 9101 just fine, even if the engine does not know in advance that these are binary fields.
> Note that under this proposal binary fields MUST be length encoded in this manner; and non-binary fields MUST NOT be length-encoded. Length encoding a field defined in the spec as something other than binary can, at the receiver’s discretion, result in a session-level reject.
> Thoughts? I understand that to send a binary field, one must know that the field is, indeed, binary. This just removes the requirement on the receiving end. Is it worth the change to the message formatting to do this in the next spec version?
>
>

[ original email was from John Prewett - jprewett@lavatrading.com ]
I think this is a good proposal:

  1. The protocol would be more robust.
  2. The data dictionary would be simpler.
  3. The FIX engine implementation would be simpler.
  4. Slightly fewer bytes would be transmitted.

Only disadvantage:
Not backwards-compatible with existing FIX engines.

It would be very friendly to permit both the proposed and existing formats of encoding data fields. As a result, older FIX engines wouldn’t be rendered obsolete.

Does anyone have any suggestions/comments as to whether it is possible/feasable/desirable to support backwards compatability?

Then again, many of us make a living out of building FIX engines. Maybe I should just be quiet and strongly support a non-backwards-compatible change :wink:

[ original email was from Ajay Kamdar - akamdar@javtech.com ]
I think backward compatibility with FIX engines currently in production is highly desirable.

Since almost no one can really afford to discontinue trading with a counter party just because that counter party is slow to upgrade their FIX engine, I believe as an engine vendor our customers will expect us to maintain backward compatibility with the current raw data encoding format regardless of whether the standard eventually mandates it or not.


Ajay Kamdar
Javelin Technologies
akamdar@javtech.co

> I think this is a good proposal:
> 1. The protocol would be more robust.
> 2. The data dictionary would be simpler.
> 3. The FIX engine implementation would be simpler.
> 4. Slightly fewer bytes would be transmitted.
>
> Only disadvantage:
> Not backwards-compatible with existing FIX engines.
>
> It would be very friendly to permit both the proposed and existing formats of encoding data fields. As a result, older FIX engines wouldn’t be rendered obsolete.
>
> Does anyone have any suggestions/comments as to whether it is possible/feasable/desirable to support backwards compatability?
>
> Then again, many of us make a living out of building FIX engines. Maybe I should just be quiet and strongly support a non-backwards-compatible change :wink:
>

[ original email was from John Prewett - jprewett@lavatrading.com ]
Let me clarify that I was discussing the potential desirability of backwards compatibility for any new FIX protocol versions that adopt this proposal. This proposal cannot be retro-fitted to an existing protocol (FIX.4.0 thru FIX.4.4) without dire consequences.

Backwards compatibility wouldn’t be an issue for users of FIX.4.0 thru FIX.4.4 as these protocols remain completely unchanged. FIX engines that support any of these existing protocols would continue to do so.

> I think backward compatibility with FIX engines currently in production is highly desirable.
>
> Since almost no one can really afford to discontinue trading with a counter party just because that counter party is slow to upgrade their FIX engine, I believe as an engine vendor our customers will expect us to maintain backward compatibility with the current raw data encoding format regardless of whether the standard eventually mandates it or not.
>
> –
> Ajay Kamdar
> Javelin Technologies
> akamdar@javtech.co

[ original email was from Ryan Pierce - rpierce@taltrade.com ]
> Let me clarify that I was discussing the potential desirability of backwards compatibility for any new FIX protocol versions that adopt this proposal. This proposal cannot be retro-fitted to an existing protocol (FIX.4.0 thru FIX.4.4) without dire consequences.
>
> Backwards compatibility wouldn’t be an issue for users of FIX.4.0 thru FIX.4.4 as these protocols remain completely unchanged. FIX engines that support any of these existing protocols would continue to do so.

You’re right. FIX version 4.4 and lower would use the existing format. The proposal would be for, say, FIX 4.5 and higher to use the new format. I suggest that it be required for FIX 4.5; the use of the old-style length fields would be illegal in 4.5, just as using length encoding in 4.4 and below would be illegal.

When a firm upgrades to a new FIX version, generally there are almost always software changes needed. For example, FIX 4.2 changed the session so that Resend Requests no longer used the artificial 999999 upper end cap, thus allowing FIX to be used for more than 1 million messages / day. A firm simply cannot use a pre-4.2 engine with 4.2 without engine changes. This type of change falls under the same category.

[ original email was from Anton Kryukov - anton@harborllc.com ]
This is a worthy proposition, but I would encode the length differently:

[tag]=[length][SOH][data][SOH]

This would simplify parsing, as the parser would not have to do anything special to find the binary field tag. Once the tag is found and determined to designate a binary field, special processing can be performed. This could also be made compatible with the older versions, in a sense that newer versions would still be able to process the old encoding, and the older engines would simply reject messages with the new fields as malformed.

Anton Kryukov

> The following proposal was considered by the FIX technical committee as a potentail change for FIX.4.4. It was defered for consideration after the release of FIX.4.4. Comments are invited on it as a proposed change for a later release.
>
> =================================================
>
> The way that raw data fields are handled and validated in FIX today is somewhat cumbersome. The engine must look for specific, special tags that tell the engine how to parse the next field. This means that, to receive a message with binary data fields, the engine must have a mapping of pairs for the tag numbers of binary fields and their corresponding length fields. This has two adverse effects:
> 1. In upgrading an engine to support newer FIX versions, it is trivial to forget to add the new binary fields introduced in that version. If someone sends you a message with such a field, you probably would be OK if that field doesn’t contain SOH, it probably would pass user acceptance testing, go into production, and explode the first time someone sends an SOH in that field.
> 2. Adding user-defined binary fields is extremely dangerous, as doing so requires all of one’s counterparties to modify their engines to recognize the data and length tags. This is significantly different from adding, say, a string or int user-defined field, which is relatively safe.
> I suggest that we make binary fields self-describing by changing the formatting rules. Eliminate all length fields that describe binary fields,and embed “+length” to the tag prior to the “=”. For example:
> 9100+12=BinaryField[SOH]
> 9101+23=Embedded[SOH]Delimiter[SOH]Test[SOH]
> 9102=Normal Field[SOH]
> Any engine receiving this message knows immediately that 9100 and 9101 are binary fields, can parse them accordingly, and can handle the embedded [SOH]'s in 9101 just fine, even if the engine does not know in advance that these are binary fields.
> Note that under this proposal binary fields MUST be length encoded in this manner; and non-binary fields MUST NOT be length-encoded. Length encoding a field defined in the spec as something other than binary can, at the receiver’s discretion, result in a session-level reject.
> Thoughts? I understand that to send a binary field, one must know that the field is, indeed, binary. This just removes the requirement on the receiving end. Is it worth the change to the message formatting to do this in the next spec version?
>
>

> This is a worthy proposition, but I would encode the length differently:
>
> [tag]=[length][SOH][data][SOH]

I believe the main advantage of the proposed solution is to avoid requirement for awareness of specific binary tags on the receiving side (if not on the sending). Given that, the format above looks ambiguous to me. For example how the receiving FIX engine would process the following providing it’s not aware that tag 96 is binary: 96=4<SOH>35=0<SOH> ([data] happen to look like a valid <tag>=<value> sequence)? Or am I missing something?

>
> This would simplify parsing, as the parser would not have to do anything special to find the binary field tag. Once the tag is found and determined to designate a binary field, special processing can be performed. This could also be made compatible with the older versions, in a sense that newer versions would still be able to process the old encoding, and the older engines would simply reject messages with the new fields as malformed.
>
> Anton Kryukov
>
> > The following proposal was considered by the FIX technical committee as a potentail change for FIX.4.4. It was defered for consideration after the release of FIX.4.4. Comments are invited on it as a proposed change for a later release.
> >
> > =================================================
> >
> > The way that raw data fields are handled and validated in FIX today is somewhat cumbersome. The engine must look for specific, special tags that tell the engine how to parse the next field. This means that, to receive a message with binary data fields, the engine must have a mapping of pairs for the tag numbers of binary fields and their corresponding length fields. This has two adverse effects:
> > 1. In upgrading an engine to support newer FIX versions, it is trivial to forget to add the new binary fields introduced in that version. If someone sends you a message with such a field, you probably would be OK if that field doesn’t contain SOH, it probably would pass user acceptance testing, go into production, and explode the first time someone sends an SOH in that field.
> > 2. Adding user-defined binary fields is extremely dangerous, as doing so requires all of one’s counterparties to modify their engines to recognize the data and length tags. This is significantly different from adding, say, a string or int user-defined field, which is relatively safe.
> > I suggest that we make binary fields self-describing by changing the formatting rules. Eliminate all length fields that describe binary fields,and embed “+length” to the tag prior to the “=”. For example:
> > 9100+12=BinaryField[SOH]
> > 9101+23=Embedded[SOH]Delimiter[SOH]Test[SOH]
> > 9102=Normal Field[SOH]
> > Any engine receiving this message knows immediately that 9100 and 9101 are binary fields, can parse them accordingly, and can handle the embedded [SOH]'s in 9101 just fine, even if the engine does not know in advance that these are binary fields.
> > Note that under this proposal binary fields MUST be length encoded in this manner; and non-binary fields MUST NOT be length-encoded. Length encoding a field defined in the spec as something other than binary can, at the receiver’s discretion, result in a session-level reject.
> > Thoughts? I understand that to send a binary field, one must know that the field is, indeed, binary. This just removes the requirement on the receiving end. Is it worth the change to the message formatting to do this in the next spec version?
> >
> >
>

[ original email was from Geoff Kratz - geoff.kratz@albertasolutions.com ]
I would suggest that, rather than using <SOH> within the field, that instead a different character be used (say <US> or 0x1f as a ‘unit separator’). This way the existing parsers don’t break, and as mentioned below, engines that don’t know this is explicitly a binary field won’t have a problem.


Geoff Kratz
Director
Alberta Market Solutions Ltd.
F: +1-403-685-4025
P: +1-403-246-5104
E: geoff.kratz@albertasolutions.com

> > This is a worthy proposition, but I would encode the length differently:
> >
> > [tag]=[length][SOH][data][SOH]
>
> I believe the main advantage of the proposed solution is to avoid requirement for awareness of specific binary tags on the receiving side (if not on the sending). Given that, the format above looks ambiguous to me. For example how the receiving FIX engine would process the following providing it’s not aware that tag 96 is binary: 96=4<SOH>35=0<SOH> ([data] happen to look like a valid <tag>=<value> sequence)? Or am I missing something?
>
> >
> > This would simplify parsing, as the parser would not have to do anything special to find the binary field tag. Once the tag is found and determined to designate a binary field, special processing can be performed. This could also be made compatible with the older versions, in a sense that newer versions would still be able to process the old encoding, and the older engines would simply reject messages with the new fields as malformed.
> >
> > Anton Kryukov
> >
> > > The following proposal was considered by the FIX technical committee as a potentail change for FIX.4.4. It was defered for consideration after the release of FIX.4.4. Comments are invited on it as a proposed change for a later release.
> > >
> > > =================================================
> > >
> > > The way that raw data fields are handled and validated in FIX today is somewhat cumbersome. The engine must look for specific, special tags that tell the engine how to parse the next field. This means that, to receive a message with binary data fields, the engine must have a mapping of pairs for the tag numbers of binary fields and their corresponding length fields. This has two adverse effects:
> > > 1. In upgrading an engine to support newer FIX versions, it is trivial to forget to add the new binary fields introduced in that version. If someone sends you a message with such a field, you probably would be OK if that field doesn’t contain SOH, it probably would pass user acceptance testing, go into production, and explode the first time someone sends an SOH in that field.
> > > 2. Adding user-defined binary fields is extremely dangerous, as doing so requires all of one’s counterparties to modify their engines to recognize the data and length tags. This is significantly different from adding, say, a string or int user-defined field, which is relatively safe.
> > > I suggest that we make binary fields self-describing by changing the formatting rules. Eliminate all length fields that describe binary fields,and embed “+length” to the tag prior to the “=”. For example:
> > > 9100+12=BinaryField[SOH]
> > > 9101+23=Embedded[SOH]Delimiter[SOH]Test[SOH]
> > > 9102=Normal Field[SOH]
> > > Any engine receiving this message knows immediately that 9100 and 9101 are binary fields, can parse them accordingly, and can handle the embedded [SOH]'s in 9101 just fine, even if the engine does not know in advance that these are binary fields.
> > > Note that under this proposal binary fields MUST be length encoded in this manner; and non-binary fields MUST NOT be length-encoded. Length encoding a field defined in the spec as something other than binary can, at the receiver’s discretion, result in a session-level reject.
> > > Thoughts? I understand that to send a binary field, one must know that the field is, indeed, binary. This just removes the requirement on the receiving end. Is it worth the change to the message formatting to do this in the next spec version?
> > >
> > >
> >
>

[ original email was from Ryan Pierce - rpierce@taltrade.com ]
The real issue here is one of in-band signalling vs. out-of-band signalling. In-band signalling is risky because the data can mimic the signal and the two can be confused.

As was mentioned, Tag=Length[SOH]Value[SOH] isn’t useable because Value could begin with a number and = thus being confused for a new Tag=Value pair.

Similarly, Tag=Length[US]Value[SOH] isn’t useable. At present, FIX makes NO restrictions on character set except that normal fields can’t contain an [SOH]. [US] is a valid character in a FIX field. So if someone sends a normal Text field containing it, i.e.

…58=1000[US]Foo[SOH]10=123[SOH]8=FIX.4.5[SOH]9=100[SOH]35=…

then a FIX parser will think that a 1000 byte binary field follows, and consider the checksum and next message(s) part of that binary field.

Out-of-band signalling, i.e. keeping the length out of the data field itself, eliminates this problem, such as:

Tag+Length=Value[SOH] or
Tag:Length=Value[SOH]

are completely unambiguous in the receiving engine. While I proposed the first initially, I’m thinking the second might be better because some library functions might consider the + as a sign and part of the number.

Further, backwards compatibility with existing parsers is not an issue. We’re proposing this for FIX 4.5 (or whatever the next FIX version is called) and later. FIX engines generally have to be changed to support new FIX versions anyway, so this would be part of all 4.5 engines, and could be made invisible to the business logic.

[ original email was from Geoff Kratz - geoff.kratz@albertasolutions.com ]
Actually, in any case we are talking about changes to parsers, so if we’re going to change the protocol, maybe its time to make it more self-describing, and possibly more efficient. This goes beyond binary data fields, and could include more efficient encoding of numeric data (using agreed upon formats to represent numbers in binary form rather than strings and avoid a cumbersome string to number conversion, etc), and providing information so that parsers could know what type a field is without having to look up the tag somewhere. This could allow the parsers to be more independent of the “meaning” of the tags, and might simplify some things. It could allow parsers to handle tags they don’t know about in a more elegant fashion. Going this far would be a pretty radical change though, so I’m not sure if that would work well for the community.

> The real issue here is one of in-band signalling vs. out-of-band signalling. In-band signalling is risky because the data can mimic the signal and the two can be confused.
>
> As was mentioned, Tag=Length[SOH]Value[SOH] isn’t useable because Value could begin with a number and = thus being confused for a new Tag=Value pair.
>
> Similarly, Tag=Length[US]Value[SOH] isn’t useable. At present, FIX makes NO restrictions on character set except that normal fields can’t contain an [SOH]. [US] is a valid character in a FIX field. So if someone sends a normal Text field containing it, i.e.
>
> …58=1000[US]Foo[SOH]10=123[SOH]8=FIX.4.5[SOH]9=100[SOH]35=…
>
> then a FIX parser will think that a 1000 byte binary field follows, and consider the checksum and next message(s) part of that binary field.
>
> Out-of-band signalling, i.e. keeping the length out of the data field itself, eliminates this problem, such as:
>
> Tag+Length=Value[SOH] or
> Tag:Length=Value[SOH]
>
> are completely unambiguous in the receiving engine. While I proposed the first initially, I’m thinking the second might be better because some library functions might consider the + as a sign and part of the number.
>
> Further, backwards compatibility with existing parsers is not an issue. We’re proposing this for FIX 4.5 (or whatever the next FIX version is called) and later. FIX engines generally have to be changed to support new FIX versions anyway, so this would be part of all 4.5 engines, and could be made invisible to the business logic.
>
>

But doesn’t encoding numeric data violate one of the foundation guidelines of XML of having human-readable data?

> Actually, in any case we are talking about changes to parsers, so if we’re going to change the protocol, maybe its time to make it more self-describing, and possibly more efficient. This goes beyond binary data fields, and could include more efficient encoding of numeric data (using agreed upon formats to represent numbers in binary form rather than strings and avoid a cumbersome string to number conversion, etc), and providing information so that parsers could know what type a field is without having to look up the tag somewhere. This could allow the parsers to be more independent of the “meaning” of the tags, and might simplify some things. It could allow parsers to handle tags they don’t know about in a more elegant fashion. Going this far would be a pretty radical change though, so I’m not sure if that would work well for the community.
>
> > The real issue here is one of in-band signalling vs. out-of-band signalling. In-band signalling is risky because the data can mimic the signal and the two can be confused.
> >
> > As was mentioned, Tag=Length[SOH]Value[SOH] isn’t useable because Value could begin with a number and = thus being confused for a new Tag=Value pair.
> >
> > Similarly, Tag=Length[US]Value[SOH] isn’t useable. At present, FIX makes NO restrictions on character set except that normal fields can’t contain an [SOH]. [US] is a valid character in a FIX field. So if someone sends a normal Text field containing it, i.e.
> >
> > …58=1000[US]Foo[SOH]10=123[SOH]8=FIX.4.5[SOH]9=100[SOH]35=…
> >
> > then a FIX parser will think that a 1000 byte binary field follows, and consider the checksum and next message(s) part of that binary field.
> >
> > Out-of-band signalling, i.e. keeping the length out of the data field itself, eliminates this problem, such as:
> >
> > Tag+Length=Value[SOH] or
> > Tag:Length=Value[SOH]
> >
> > are completely unambiguous in the receiving engine. While I proposed the first initially, I’m thinking the second might be better because some library functions might consider the + as a sign and part of the number.
> >
> > Further, backwards compatibility with existing parsers is not an issue. We’re proposing this for FIX 4.5 (or whatever the next FIX version is called) and later. FIX engines generally have to be changed to support new FIX versions anyway, so this would be part of all 4.5 engines, and could be made invisible to the business logic.
> >
> >
>

[ original email was from Geoff Kratz - geoff.kratz@albertasolutions.com ]
This change would mean that encoding of some values would differ from FIXML, which would also be a significant deviation from 4.4. today. I don’t see this applying to FIXML, only to tag=value. XML parsers are generally not happy when non-character data appears in the XML data stream.

Right now I’m of two minds on this myself. On one hand, a change that makes parsing easier, separates the meaning of the content from the form, and potentially reduces message size, would be nice. On the other hand, we’re talking about a new format, and if the real goal is to ultimately move to FIXML (which I believe it should be), then this may not be worth the effort.


Geoff Kratz
Director
Alberta Market Solutions Ltd.

> But doesn’t encoding numeric data violate one of the foundation guidelines of XML of having human-readable data?
>
> > Actually, in any case we are talking about changes to parsers, so if we’re going to change the protocol, maybe its time to make it more self-describing, and possibly more efficient. This goes beyond binary data fields, and could include more efficient encoding of numeric data (using agreed upon formats to represent numbers in binary form rather than strings and avoid a cumbersome string to number conversion, etc), and providing information so that parsers could know what type a field is without having to look up the tag somewhere. This could allow the parsers to be more independent of the “meaning” of the tags, and might simplify some things. It could allow parsers to handle tags they don’t know about in a more elegant fashion. Going this far would be a pretty radical change though, so I’m not sure if that would work well for the community.
> >
> > > The real issue here is one of in-band signalling vs. out-of-band signalling. In-band signalling is risky because the data can mimic the signal and the two can be confused.
> > >
> > > As was mentioned, Tag=Length[SOH]Value[SOH] isn’t useable because Value could begin with a number and = thus being confused for a new Tag=Value pair.
> > >
> > > Similarly, Tag=Length[US]Value[SOH] isn’t useable. At present, FIX makes NO restrictions on character set except that normal fields can’t contain an [SOH]. [US] is a valid character in a FIX field. So if someone sends a normal Text field containing it, i.e.
> > >
> > > …58=1000[US]Foo[SOH]10=123[SOH]8=FIX.4.5[SOH]9=100[SOH]35=…
> > >
> > > then a FIX parser will think that a 1000 byte binary field follows, and consider the checksum and next message(s) part of that binary field.
> > >
> > > Out-of-band signalling, i.e. keeping the length out of the data field itself, eliminates this problem, such as:
> > >
> > > Tag+Length=Value[SOH] or
> > > Tag:Length=Value[SOH]
> > >
> > > are completely unambiguous in the receiving engine. While I proposed the first initially, I’m thinking the second might be better because some library functions might consider the + as a sign and part of the number.
> > >
> > > Further, backwards compatibility with existing parsers is not an issue. We’re proposing this for FIX 4.5 (or whatever the next FIX version is called) and later. FIX engines generally have to be changed to support new FIX versions anyway, so this would be part of all 4.5 engines, and could be made invisible to the business logic.
> > >
> > >
> >
>

[ original email was from Anton Kryukov - anton@harborllc.com ]
> > This is a worthy proposition, but I would encode the length differently:
> >
> > [tag]=[length][SOH][data][SOH]
>
> I believe the main advantage of the proposed solution is to avoid requirement for awareness of specific binary tags on the receiving side (if not on the sending). Given that, the format above looks ambiguous to me. For example how the receiving FIX engine would process the following providing it’s not aware that tag 96 is binary: 96=4<SOH>35=0<SOH> ([data] happen to look like a valid <tag>=<value> sequence)? Or am I missing something?
>

One has to know that the field is binary anyway, if only to ignore the possible SOH’s embedded in the data. This is exactly why data fields come with a length specified in front of them. The only advantage of this scheme over “tag:length=data<SOH>” is that the parsing of the “tag=length” part is the same as everything else. After that, the parser has to switch from searching for <SOH> to byte counting anyway.

> > > This is a worthy proposition, but I would encode the length differently:
> > >
> > > [tag]=[length][SOH][data][SOH]
> >
> > I believe the main advantage of the proposed solution is to avoid requirement for awareness of specific binary tags on the receiving side (if not on the sending). Given that, the format above looks ambiguous to me. For example how the receiving FIX engine would process the following providing it’s not aware that tag 96 is binary: 96=4<SOH>35=0<SOH> ([data] happen to look like a valid <tag>=<value> sequence)? Or am I missing something?
> >
>
> One has to know that the field is binary anyway, if only to ignore the possible SOH’s embedded in the data. This is exactly why data fields come with a length specified in front of them. The only advantage of this scheme over “tag:length=data<SOH>” is that the parsing of the “tag=length” part is the same as everything else. After that, the parser has to switch from searching for <SOH> to byte counting anyway.
>
The format you suggest requires that FIX engine is able to identify binary fields by tag only while the main motivation of the original proposal was exactly to avoid that requirement and provide self-describing way to encode binary fields. Also I agree with other reply posted by Ryan Pierce that new format should be unambiguously identifiable as invalid by old FIX engines and this is not true if it looks like classic ‘<tag>=<value>‘ sequence. The only possible way I can think of that preserves existing message structure – (<tag>=<value><SOH>)* would be to represent SOH byte as a sequence of two SOHs (invalid sequence currently except for binary fields). This representation can be used not only for binary but also for text fields and thus special tags for local non-ASCII encodings may not be needed any more (e.g. JIS can be sent in Text). On the other hand it can raise some concerns (for example performance, or consistency – old FIX engines may seem to process new format happily until SOH suddenly occurs in some value).

[ original email was from Anton Kryukov - anton@harborllc.com ]
> > > > This is a worthy proposition, but I would encode the length differently:
> > > >
> > > > [tag]=[length][SOH][data][SOH]
> > >
> > > I believe the main advantage of the proposed solution is to avoid requirement for awareness of specific binary tags on the receiving side (if not on the sending). Given that, the format above looks ambiguous to me. For example how the receiving FIX engine would process the following providing it’s not aware that tag 96 is binary: 96=4<SOH>35=0<SOH> ([data] happen to look like a valid <tag>=<value> sequence)? Or am I missing something?
> > >
> >
> > One has to know that the field is binary anyway, if only to ignore the possible SOH’s embedded in the data. This is exactly why data fields come with a length specified in front of them. The only advantage of this scheme over “tag:length=data<SOH>” is that the parsing of the “tag=length” part is the same as everything else. After that, the parser has to switch from searching for <SOH> to byte counting anyway.
> >
> The format you suggest requires that FIX engine is able to identify binary fields by tag only while the main motivation of the original proposal was exactly to avoid that requirement and provide self-describing way to encode binary fields. Also I agree with other reply posted by Ryan Pierce that new format should be unambiguously identifiable as invalid by old FIX engines and this is not true if it looks like classic ‘<tag>=<value>‘ sequence. The only possible way I can think of that preserves existing message structure – (<tag>=<value><SOH>)* would be to represent SOH byte as a sequence of two SOHs (invalid sequence currently except for binary fields). This representation can be used not only for binary but also for text fields and thus special tags for local non-ASCII encodings may not be needed any more (e.g. JIS can be sent in Text). On the other hand it can raise some concerns (for example performance, or consistency – old FIX engines may seem to process new format happily until SOH suddenly occurs in some value).
>

I concede. I missed the intention of the original proposal to divorce the parsing process from the data dictionary. In that case something could probably also be done with field groups. It would be convenient for furhter processing, if the parser put the grouped fields in a hierarchical structure. This could be achieved by giving the group size field a special syntax and putting an extra <SOH> at the end of the groups listing, something like

<group_size_tag>:<N>:<first_field_tag>=<val_1_1><SOH>…<first_field_tag>=<val_1_N><SOH>…<Mth_field_tag>=<val_M_N><SOH><SOH>

In general, the

<tag>:<val>:
<tag>=<val><SOH>

<SOH>

or similar syntax could support an arbitrary nesting structure.

[ original email was from John Prewett - jprewett@lavatrading.com ]
> I concede. I missed the intention of the original proposal to divorce the parsing process from the data dictionary. In that case something could probably also be done with field groups. It would be convenient for furhter processing, if the parser put the grouped fields in a hierarchical structure. This could be achieved by giving the group size field a special syntax and putting an extra <SOH> at the end of the groups listing, something like
>
> <group_size_tag>:<N>:<first_field_tag>=<val_1_1><SOH>…<first_field_tag>=<val_1_N><SOH>…<Mth_field_tag>=<val_M_N><SOH><SOH>
>
> In general, the
>
> <tag>:<val>:
> <tag>=<val><SOH>
> …
> <SOH>
>
> or similar syntax could support an arbitrary nesting structure.
>

Personally, I prefer keeping the discussion focused on the original proposal and not widening its scope. That way, it has more of a chance of being included in the next standard (I was previously been hoping this proposal would be included in FIX.4.4). By all means open up another discussion thread on your new proposal.