From 2e7f39f6b69d98fccba714266f3fa92bbce934cd Mon Sep 17 00:00:00 2001 From: Juan Cruz Viotti Date: Wed, 20 Jan 2021 17:05:19 -0400 Subject: Clarify Compact Protocol var int encoding definition Patch: Juan Cruz Viotti This closes #2312 I'm having problems following the var int explanation from the Compact Protocol spec. Here is an attempt to clarify it with more precise encoding steps and with an example. I'm also mentioning, for completeness, that the formal name of such variable-length integer encoding is Unsigned LEB128 (Unsigned Little Endian Base-128). Signed-off-by: Juan Cruz Viotti --- doc/specs/thrift-compact-protocol.md | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) (limited to 'doc') diff --git a/doc/specs/thrift-compact-protocol.md b/doc/specs/thrift-compact-protocol.md index 001bb1229..89301eb5d 100644 --- a/doc/specs/thrift-compact-protocol.md +++ b/doc/specs/thrift-compact-protocol.md @@ -61,9 +61,21 @@ def longToZigZag(n: Long): Long = (n << 1) ^ (n >> 63) def zigzagToLong(n: Long): Long = (n >>> 1) ^ - (n & 1) ``` -The zigzag int is then encoded as a *var int*. Var ints take 1 to 5 bytes (int32) or 1 to 10 bytes (int64). The most -significant bit of each byte indicates if more bytes follow. The concatenation of the least significant 7 bits from each -byte form the number, where the first byte has the most significant bits (so they are in big endian or network order). +The zigzag int is then encoded as a *var int*, also known as *Unsigned LEB128*. Var ints take 1 to 5 bytes (int32) or +1 to 10 bytes (int64). The process consists in taking a Big Endian unsigned integer, left-padding the bit-string to +make it a multiple of 7 bits, splitting it into 7-bit groups, prefixing the most-significant 7-bit group with the 0 +bit, prefixing the remaining 7-bit groups with the 1 bit and encoding the resulting bit-string in Little Endian. + +For example, the integer 50399 is encoded as follows: + +``` +50399 = 1100 0100 1101 1111 (Big Endian representation) + = 00000 1100 0100 1101 1111 (Left-padding) + = 0000011 0001001 1011111 (7-bit groups) + = 00000011 10001001 11011111 (Most-significant bit prefixes) + = 11011111 10001001 00000011 (Little Endian representation) + = 0xDF 0x89 0x03 +``` Var ints are sometimes used directly inside the compact protocol to represent positive numbers. -- cgit v1.2.1