FAST Extension proposal - SET

Imported from previous forum

All,

this is the second of four proposals announced in a previous post.
Please review and comment.
/Rolf

EXTENSION PROPOSAL - SET (multi-value string)

Background

FIX as well as other protocols make use of fields with a fixed
set of alternative values. These fields may either be single-value
(an enumerated value) or multi-value (a set of values).

Introduction

This proposal discusses an impl for multi-value fields.

Example

Let’s use TradeCondition (tag 277) as an example:
(other examples are tags 276, 1031, and 1035)

[ 1] 0 = Cancel
[ 2] 1 = Implied Trade
[ 3] 2 = Marketplace entered trade
[ 4] 3 = Mult Asset Class Multileg Trade
[ 5] 4 = Multileg-to-Multileg Trade

[ 6] A = Cash (only) Market
[ 7] B = Average Price Trade
[ 8] D = Next Day (only)Market
[ 9] E = Opening/Reopening Trade Detail
[10] G = Rule 127 Trade (NYSE)

[11] F = Intraday Trade Detail
[12] H = Rule 155 Trade (AMEX)
[13] K = Opened
[14] I = Sold Last (late reporting)
[15] J = Next Day Trade (next day clearing)
[16] L = Seller
[17] M = Sold
[18] N = Stopped Stock
[19] P = Imbalance More Buyers
[20] Q = Imbalance More Sellers

[21] thru [80] omitted …

[81] AO = Crossed (duplicate)
[82] AP = Fast Market
[83] AQ = Automatic Execution
[84] AR = Form T
[85] AS = Basket Index
[86] AT = Burst Basket
[87] AV = Outside Spread

Any combination of these 87 value could be represented using
a bit vector with 87 positions. With SBIT-enconding we get
7 bits per byte so 87 bits will require 13 bytes.

It is unlikely that a specific conversation between two FIX
parties would utilize more than a small part of the values
included in the list above. Furthermore, it’s likely that
the values used will occur with different frequency. By only
including the values to be used and ordering them so that
the most frequent values get positions early in the value
space, the common case will have a short representation.

For example, let’s say we want to include the following values:

[ 6] A = Cash (only) Market
[ 7] B = Average Price Trade
[ 8] D = Next Day (only)Market
[ 9] E = Opening/Reopening Trade Detail
[10] G = Rule 127 Trade (NYSE)
[11] F = Intraday Trade Detail
[12] H = Rule 155 Trade (AMEX)
[13] K = Opened
[14] I = Sold Last (late reporting)

Each of the values may be assigned a bit value and these
bit values can be mixed to give arbitrary combinations.

[ 6] A = 1 [0000001] Cash (only) Market
[ 7] B = 2 [0000010] Average Price Trade
[ 8] D = 3 [0000100] Next Day (only)Market
[ 9] E = 4 [0001000] Opening/Reopening Trade Detail
[10] G = 5 [0010000] Rule 127 Trade (NYSE)
[11] F = 6 [0100000] Intraday Trade Detail
[12] H = 7 [1000000] Rule 155 Trade (AMEX)

[13] K = 8 [0000001 0000000] Opened
[14] I = 9 [0000010 0000000] Sold Last (late reporting)

“A C” => [0000101]
“A H K” => [0000001 1000001]

(FAST encoding automatically adjusts the number of bytes
used in the wire representation and the decoding will
be able to detect how many bytes are used.)

The number of bytes needed in the wire representation is
dependent on the actual combination of values.

Optimizing The Bit Assignments

Let’s assume G and H are uncommon.

Sequential assignment of the values places both G and H in
the first byte whereas the more frequent K and I would be
placed in the second byte:

Every time K or I needs to be represented, the field will be
two bytes long.

(this is similar to the pmap case of frequency of presence
and pmap position for a specific field)

By reordering G and H, and K and I, we can improve on the
efficiency of the common case:

[ 6] A = 1 [0000001] Cash (only) Market
[ 7] B = 2 [0000010] Average Price Trade
[ 8] D = 3 [0000100] Next Day (only)Market
[ 9] E = 4 [0001000] Opening/Reopening Trade Detail
[11] F = 5 [0010000] Intraday Trade Detail
[13] K = 6 [0100000] Opened
[14] I = 7 [1000000] Sold Last (late reporting)

[10] G = 8 [0000001 0000000] Rule 127 Trade (NYSE)
[12] H = 9 [0000010 0000000] Rule 155 Trade (AMEX)

Wire representation

No new representation needed.
SBIT-encoded fields will be used.

Template Syntax

...

Many times there are large groups of values which are mutually exclusive in a MultiValueString. The example field used, TradeCondition, is likely to have many mutually exclusive subsets of values.

The set extension should allow for a definition of these mutually exlusive fields. Let’s use the 20 A* values for TradeCondition and assume that they are mutually exclusive. Only one of the A* values can be set at a time. Using a single bit for each value would require 3 bytes in the Set encoding. Using an option declaration this can be reduced to 5 bits.

<set name="TradeCondition">
<value>A</value>
<value>B</value>
<option presence="optional">
<value>AA</value>
<value>AB</value>

</option>
</set>

The presence attribute indicates whether null should be accepted as value or if the option should always be present. In this case:

00000 represents no value
00001 represents AA
00010 represents AB

The example above would take 1 byte in the worst case.

Why wouldn’t the enum attribute within the set serve the same purpose. The enum attribute is for single values and requires the same number of bits. What is the advantage of introducing another attribute?

Many times there are large groups of values which are mutually exclusive
in a MultiValueString. The example field used, TradeCondition, is likely
to have many mutually exclusive subsets of values.

The set extension should allow for a definition of these mutually
exlusive fields. Let’s use the 20 A* values for TradeCondition and
assume that they are mutually exclusive. Only one of the A* values can
be set at a time. Using a single bit for each value would require 3
bytes in the Set encoding. Using an option declaration this can be
reduced to 5 bits.

A B AA AB …

The presence attribute indicates whether null should be accepted as
value or if the option should always be present. In this case:

00000 represents no value 00001 represents AA 00010 represents AB

The example above would take 1 byte in the worst case.

Why wouldn’t the enum attribute within the set serve the same purpose.
The enum attribute is for single values and requires the same number of
bits. What is the advantage of introducing another attribute?

The Set represents MultiValueString fields which are enumerations. The option group just specifies which values are mutually exclusive. Using enum here instead of option is just a matter of semantics. I picked option because it won’t get confused with the enum field and option seems to indicate you only get one choice.

[ original email was from Darshan Khedekar - darshan.khedekar.ext@deutsche-boerse.com ]
Hi,

As I understand the proposal is to assign enums to strings and single unqiue bit representations to the enums.

My question is, Is a ENUM representation required?
One could define the SET as for e.g.

One difference is that ENUM would have a two bytes for “AB” 0x01 and 0x02 and SET would have only one byte as 0x03 (after removing stop bit)

If both the schemes try to assign numbers to the string / char values there is definite advantage using SET. And as FAST is all about fitting large data into smaller size why not merge the two and only offer a SET.

Regards,
Darshan

All,

this is the second of four proposals announced in a previous post.
Please review and comment. /Rolf

EXTENSION PROPOSAL - SET (multi-value string)

Background

FIX as well as other protocols make use of fields with a fixed set of
alternative values. These fields may either be single-value (an
enumerated value) or multi-value (a set of values).

Introduction

This proposal discusses an impl for multi-value fields.

Example

Let’s use TradeCondition (tag 277) as an example: (other examples are
tags 276, 1031, and 1035)

[ 1] 0 = Cancel
[ 2] 1 = Implied Trade
[ 3] 2 = Marketplace entered trade
[ 4] 3 = Mult Asset Class Multileg Trade
[ 5] 4 = Multileg-to-Multileg Trade

[ 6] A = Cash (only) Market
[ 7] B = Average Price Trade
[ 8] D = Next Day (only)Market
[ 9] E = Opening/Reopening Trade Detail
[10] G = Rule 127 Trade (NYSE)

[11] F = Intraday Trade Detail
[12] H = Rule 155 Trade (AMEX)
[13] K = Opened [14] I = Sold Last (late reporting) [15] J = Next Day
Trade (next day clearing)
[14] L = Seller
[15] M = Sold
[16] N = Stopped Stock
[17] P = Imbalance More Buyers
[18] Q = Imbalance More Sellers

[19] thru [80] omitted …

[20] AO = Crossed (duplicate)
[21] AP = Fast Market
[22] AQ = Automatic Execution
[23] AR = Form T
[24] AS = Basket Index
[25] AT = Burst Basket
[26] AV = Outside Spread

Any combination of these 87 value could be represented using a bit
vector with 87 positions. With SBIT-enconding we get 7 bits per byte so
87 bits will require 13 bytes.

It is unlikely that a specific conversation between two FIX parties
would utilize more than a small part of the values included in the list
above. Furthermore, it’s likely that the values used will occur with
different frequency. By only including the values to be used and
ordering them so that the most frequent values get positions early in
the value space, the common case will have a short representation.

For example, let’s say we want to include the following values:

[ 27] A = Cash (only) Market
[ 28] B = Average Price Trade
[ 29] D = Next Day (only)Market
[ 30] E = Opening/Reopening Trade Detail
[31] G = Rule 127 Trade (NYSE)
[32] F = Intraday Trade Detail
[33] H = Rule 155 Trade (AMEX)
[34] K = Opened [14] I = Sold Last (late reporting)

Each of the values may be assigned a bit value and these bit values can
be mixed to give arbitrary combinations.

[ 35] A = 1 [0000001] Cash (only) Market
[ 36] B = 2 [0000010] Average Price Trade
[ 37] D = 3 [0000100] Next Day (only)Market
[ 38] E = 4 [0001000] Opening/Reopening Trade Detail
[39] G = 5 [0010000] Rule 127 Trade (NYSE)
[40] F = 6 [0100000] Intraday Trade Detail
[41] H = 7 [1000000] Rule 155 Trade (AMEX)

[42] K = 8 [0000001 0000000] Opened [14] I = 9 [0000010 0000000] Sold
Last (late reporting)

“A C” => [0000101] “A H K” => [0000001 1000001]

(FAST encoding automatically adjusts the number of bytes used in the
wire representation and the decoding will be able to detect how many
bytes are used.)

The number of bytes needed in the wire representation is dependent on
the actual combination of values.

Optimizing The Bit Assignments

Let’s assume G and H are uncommon.

Sequential assignment of the values places both G and H in the
first byte whereas the more frequent K and I would be placed in the
second byte:

Every time K or I needs to be represented, the field will be two
bytes long.

(this is similar to the pmap case of frequency of presence and pmap
position for a specific field)

By reordering G and H, and K and I, we can improve on the efficiency of
the common case:

[ 43] A = 1 [0000001] Cash (only) Market
[ 44] B = 2 [0000010] Average Price Trade
[ 45] D = 3 [0000100] Next Day (only)Market
[ 46] E = 4 [0001000] Opening/Reopening Trade Detail
[47] F = 5 [0010000] Intraday Trade Detail
[48] K = 6 [0100000] Opened [14] I = 7 [1000000] Sold Last (late
reporting)

[49] G = 8 [0000001 0000000] Rule 127 Trade (NYSE)
[50] H = 9 [0000010 0000000] Rule 155 Trade (AMEX)

Wire representation

No new representation needed. SBIT-encoded fields will be used.

Template Syntax

[ original email was from Darshan Khedekar - darshan.khedekar.ext@deutsche-boerse.com ]
I feel that for a interface designer it is also important to take into account the increase in sender and receiver processing for a SET.
The process ASCII->{My SET mapping}-> Multi-Bit conversion
Reverse on the receiver side.

It would be interesting to see the increase in the processing times due to additional work at sender and receiver?

For a designer at times it is OK to sacrifice compression and have a sub-optimal compression than to increase the overall message processing times. In the exchange env. the sender could at times respond quickly to the increasing processing requirements but it might be a hugh tasks for the receivers to add more capacity.

All,

this is the second of four proposals announced in a previous post.
Please review and comment. /Rolf

EXTENSION PROPOSAL - SET (multi-value string)

Background

FIX as well as other protocols make use of fields with a fixed set of
alternative values. These fields may either be single-value (an
enumerated value) or multi-value (a set of values).

Introduction

This proposal discusses an impl for multi-value fields.

Example

Let’s use TradeCondition (tag 277) as an example: (other examples are
tags 276, 1031, and 1035)

[ 1] 0 = Cancel
[ 2] 1 = Implied Trade
[ 3] 2 = Marketplace entered trade
[ 4] 3 = Mult Asset Class Multileg Trade
[ 5] 4 = Multileg-to-Multileg Trade

[ 6] A = Cash (only) Market
[ 7] B = Average Price Trade
[ 8] D = Next Day (only)Market
[ 9] E = Opening/Reopening Trade Detail
[10] G = Rule 127 Trade (NYSE)

[11] F = Intraday Trade Detail
[12] H = Rule 155 Trade (AMEX)
[13] K = Opened [14] I = Sold Last (late reporting) [15] J = Next Day
Trade (next day clearing)
[14] L = Seller
[15] M = Sold
[16] N = Stopped Stock
[17] P = Imbalance More Buyers
[18] Q = Imbalance More Sellers

[19] thru [80] omitted …

[20] AO = Crossed (duplicate)
[21] AP = Fast Market
[22] AQ = Automatic Execution
[23] AR = Form T
[24] AS = Basket Index
[25] AT = Burst Basket
[26] AV = Outside Spread

Any combination of these 87 value could be represented using a bit
vector with 87 positions. With SBIT-enconding we get 7 bits per byte so
87 bits will require 13 bytes.

It is unlikely that a specific conversation between two FIX parties
would utilize more than a small part of the values included in the list
above. Furthermore, it’s likely that the values used will occur with
different frequency. By only including the values to be used and
ordering them so that the most frequent values get positions early in
the value space, the common case will have a short representation.

For example, let’s say we want to include the following values:

[ 27] A = Cash (only) Market
[ 28] B = Average Price Trade
[ 29] D = Next Day (only)Market
[ 30] E = Opening/Reopening Trade Detail
[31] G = Rule 127 Trade (NYSE)
[32] F = Intraday Trade Detail
[33] H = Rule 155 Trade (AMEX)
[34] K = Opened [14] I = Sold Last (late reporting)

Each of the values may be assigned a bit value and these bit values can
be mixed to give arbitrary combinations.

[ 35] A = 1 [0000001] Cash (only) Market
[ 36] B = 2 [0000010] Average Price Trade
[ 37] D = 3 [0000100] Next Day (only)Market
[ 38] E = 4 [0001000] Opening/Reopening Trade Detail
[39] G = 5 [0010000] Rule 127 Trade (NYSE)
[40] F = 6 [0100000] Intraday Trade Detail
[41] H = 7 [1000000] Rule 155 Trade (AMEX)

[42] K = 8 [0000001 0000000] Opened [14] I = 9 [0000010 0000000] Sold
Last (late reporting)

“A C” => [0000101] “A H K” => [0000001 1000001]

(FAST encoding automatically adjusts the number of bytes used in the
wire representation and the decoding will be able to detect how many
bytes are used.)

The number of bytes needed in the wire representation is dependent on
the actual combination of values.

Optimizing The Bit Assignments

Let’s assume G and H are uncommon.

Sequential assignment of the values places both G and H in the
first byte whereas the more frequent K and I would be placed in the
second byte:

Every time K or I needs to be represented, the field will be two
bytes long.

(this is similar to the pmap case of frequency of presence and pmap
position for a specific field)

By reordering G and H, and K and I, we can improve on the efficiency of
the common case:

[ 43] A = 1 [0000001] Cash (only) Market
[ 44] B = 2 [0000010] Average Price Trade
[ 45] D = 3 [0000100] Next Day (only)Market
[ 46] E = 4 [0001000] Opening/Reopening Trade Detail
[47] F = 5 [0010000] Intraday Trade Detail
[48] K = 6 [0100000] Opened [14] I = 7 [1000000] Sold Last (late
reporting)

[49] G = 8 [0000001 0000000] Rule 127 Trade (NYSE)
[50] H = 9 [0000010 0000000] Rule 155 Trade (AMEX)

Wire representation

No new representation needed. SBIT-encoded fields will be used.

Template Syntax