blob: 4284328e5ccb1ecb77b1cbc277aaada96d5bf631 [file] [log] [blame] [view]
Tritonace86132021-07-20 08:01:19 +02001Thrift Binary protocol encoding
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +02002===============================
3
Jens Geyer57679012016-09-21 22:18:44 +02004<!--
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +02005--------------------------------------------------------------------
6
7Licensed to the Apache Software Foundation (ASF) under one
8or more contributor license agreements. See the NOTICE file
9distributed with this work for additional information
10regarding copyright ownership. The ASF licenses this file
11to you under the Apache License, Version 2.0 (the
12"License"); you may not use this file except in compliance
13with the License. You may obtain a copy of the License at
14
15 http://www.apache.org/licenses/LICENSE-2.0
16
17Unless required by applicable law or agreed to in writing,
18software distributed under the License is distributed on an
19"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
20KIND, either express or implied. See the License for the
21specific language governing permissions and limitations
22under the License.
23
24--------------------------------------------------------------------
Jens Geyer57679012016-09-21 22:18:44 +020025-->
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +020026
PoojaChandak20205b82020-11-06 11:33:40 +010027This document describes the wire encoding for RPC using the older Thrift *binary protocol*.
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +020028
29The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1 and
PoojaChandak20205b82020-11-06 11:33:40 +0100300.9.3). Other implementation, however, should behave the same.
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +020031
32For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf).
33
34# Contents
35
36* Binary protocol
37 * Base types
38 * Message
39 * Struct
40 * List and Set
41 * Map
42* BNF notation used in this document
43
44# Binary protocol
45
46## Base types
47
48### Integer encoding
49
50In the _binary protocol_ integers are encoded with the most significant byte first (big endian byte order, aka network
51order). An `int8` needs 1 byte, an `int16` 2, an `int32` 4 and an `int64` needs 8 bytes.
52
53The CPP version has the option to use the binary protocol with little endian order. Little endian gives a small but
54noticeable performance boost because contemporary CPUs use little endian when storing integers to RAM.
55
56### Enum encoding
57
58The generated code encodes `Enum`s by taking the ordinal value and then encoding that as an int32.
59
60### Binary encoding
61
62Binary is sent as follows:
63
64```
65Binary protocol, binary data, 4+ bytes:
66+--------+--------+--------+--------+--------+...+--------+
67| byte length | bytes |
68+--------+--------+--------+--------+--------+...+--------+
69```
70
71Where:
72
73* `byte length` is the length of the byte array, a signed 32 bit integer encoded in network (big endian) order (must be >= 0).
74* `bytes` are the bytes of the byte array.
75
76### String encoding
77
78*String*s are first encoded to UTF-8, and then send as binary.
79
80### Double encoding
81
82Values of type `double` are first converted to an int64 according to the IEEE 754 floating-point "double format" bit
83layout. Most run-times provide a library to make this conversion. Both the binary protocol as the compact protocol then
84encode the int64 in 8 bytes in big endian order.
85
86### Boolean encoding
87
88Values of `bool` type are first converted to an int8. True is converted to `1`, false to `0`.
89
90## Message
91
92A `Message` can be encoded in two different ways:
93
94```
95Binary protocol Message, strict encoding, 12+ bytes:
96+--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+
97|1vvvvvvv|vvvvvvvv|unused |00000mmm| name length | name | seq id |
98+--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+
99```
100
101Where:
102
103* `vvvvvvvvvvvvvvv` is the version, an unsigned 15 bit number fixed to `1` (in binary: `000 0000 0000 0001`).
104 The leading bit is `1`.
105* `unused` is an ignored byte.
106* `mmm` is the message type, an unsigned 3 bit integer. The 5 leading bits must be `0` as some clients (checked for
107 java in 0.9.1) take the whole byte.
108* `name length` is the byte length of the name field, a signed 32 bit integer encoded in network (big endian) order (must be >= 0).
109* `name` is the method name, a UTF-8 encoded string.
110* `seq id` is the sequence id, a signed 32 bit integer encoded in network (big endian) order.
111
112The second, older encoding (aka non-strict) is:
113
114```
115Binary protocol Message, old encoding, 9+ bytes:
116+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+
117| name length | name |00000mmm| seq id |
118+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+
119```
120
121Where `name length`, `name`, `mmm`, `seq id` are as above.
122
123Because `name length` must be positive (therefore the first bit is always `0`), the first bit allows the receiver to see
124whether the strict format or the old format is used. Therefore a server and client using the different variants of the
125binary protocol can transparently talk with each other. However, when strict mode is enforced, the old format is
126rejected.
127
128Message types are encoded with the following values:
129
130* _Call_: 1
131* _Reply_: 2
132* _Exception_: 3
133* _Oneway_: 4
134
135## Struct
136
137A *Struct* is a sequence of zero or more fields, followed by a stop field. Each field starts with a field header and
138is followed by the encoded field value. The encoding can be summarized by the following BNF:
139
140```
141struct ::= ( field-header field-value )* stop-field
142field-header ::= field-type field-id
143```
144
145Because each field header contains the field-id (as defined by the Thrift IDL file), the fields can be encoded in any
146order. Thrift's type system is not extensible; you can only encode the primitive types and structs. Therefore is also
147possible to handle unknown fields while decoding; these are simply ignored. While decoding the field type can be used to
148determine how to decode the field value.
149
150Note that the field name is not encoded so field renames in the IDL do not affect forward and backward compatibility.
151
152The default Java implementation (Apache Thrift 0.9.1) has undefined behavior when it tries to decode a field that has
PoojaChandak20205b82020-11-06 11:33:40 +0100153another field-type than what is expected. Theoretically, this could be detected at the cost of some additional checking.
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +0200154Other implementation may perform this check and then either ignore the field, or return a protocol exception.
155
156A *Union* is encoded exactly the same as a struct with the additional restriction that at most 1 field may be encoded.
157
158An *Exception* is encoded exactly the same as a struct.
159
160### Struct encoding
161
162In the binary protocol field headers and the stop field are encoded as follows:
163
164```
165Binary protocol field header and field value:
166+--------+--------+--------+--------+...+--------+
167|tttttttt| field id | field value |
168+--------+--------+--------+--------+...+--------+
169
170Binary protocol stop field:
171+--------+
172|00000000|
173+--------+
174```
175
176Where:
177
178* `tttttttt` the field-type, a signed 8 bit integer.
179* `field id` the field-id, a signed 16 bit integer in big endian order.
180* `field-value` the encoded field value.
181
182The following field-types are used:
183
184* `BOOL`, encoded as `2`
Tritonace86132021-07-20 08:01:19 +0200185* `I8`, encoded as `3`
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +0200186* `DOUBLE`, encoded as `4`
187* `I16`, encoded as `6`
188* `I32`, encoded as `8`
189* `I64`, encoded as `10`
Tritonace86132021-07-20 08:01:19 +0200190* `BINARY`, used for binary and string fields, encoded as `11`
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +0200191* `STRUCT`, used for structs and union fields, encoded as `12`
192* `MAP`, encoded as `13`
193* `SET`, encoded as `14`
194* `LIST`, encoded as `15`
195
196## List and Set
197
198List and sets are encoded the same: a header indicating the size and the element-type of the elements, followed by the
199encoded elements.
200
201```
202Binary protocol list (5+ bytes) and elements:
203+--------+--------+--------+--------+--------+--------+...+--------+
204|tttttttt| size | elements |
205+--------+--------+--------+--------+--------+--------+...+--------+
206```
207
208Where:
209
210* `tttttttt` is the element-type, encoded as an int8
211* `size` is the size, encoded as an int32, positive values only
212* `elements` the element values
213
214The element-type values are the same as field-types. The full list is included in the struct section above.
215
PoojaChandak20205b82020-11-06 11:33:40 +0100216The maximum list/set size is configurable. By default, there is no limit (meaning the limit is the maximum int32 value:
Erik van Oosten3f5fa5f2016-06-29 13:24:00 +02002172147483647).
218
219## Map
220
221Maps are encoded with a header indicating the size, the element-type of the keys and the element-type of the elements,
222followed by the encoded elements. The encoding follows this BNF:
223
224```
225map ::= key-element-type value-element-type size ( key value )*
226```
227
228```
229Binary protocol map (6+ bytes) and key value pairs:
230+--------+--------+--------+--------+--------+--------+--------+...+--------+
231|kkkkkkkk|vvvvvvvv| size | key value pairs |
232+--------+--------+--------+--------+--------+--------+--------+...+--------+
233```
234
235Where:
236
237* `kkkkkkkk` is the key element-type, encoded as an int8
238* `vvvvvvvv` is the value element-type, encoded as an int8
239* `size` is the size of the map, encoded as an int32, positive values only
240* `key value pairs` are the encoded keys and values
241
242The element-type values are the same as field-types. The full list is included in the struct section above.
243
244The maximum map size is configurable. By default there is no limit (meaning the limit is the maximum int32 value:
2452147483647).
246
247# BNF notation used in this document
248
249The following BNF notation is used:
250
251* a plus `+` appended to an item represents repetition; the item is repeated 1 or more times
252* a star `*` appended to an item represents optional repetition; the item is repeated 0 or more times
253* a pipe `|` between items represents choice, the first matching item is selected
254* parenthesis `(` and `)` are used for grouping multiple items