| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 1 | Thrift Remote Procedure Call |
| 2 | ============================ |
| 3 | |
| Jens Geyer | 5767901 | 2016-09-21 22:18:44 +0200 | [diff] [blame] | 4 | <!-- |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 5 | -------------------------------------------------------------------- |
| 6 | |
| 7 | Licensed to the Apache Software Foundation (ASF) under one |
| 8 | or more contributor license agreements. See the NOTICE file |
| 9 | distributed with this work for additional information |
| 10 | regarding copyright ownership. The ASF licenses this file |
| 11 | to you under the Apache License, Version 2.0 (the |
| 12 | "License"); you may not use this file except in compliance |
| 13 | with the License. You may obtain a copy of the License at |
| 14 | |
| 15 | http://www.apache.org/licenses/LICENSE-2.0 |
| 16 | |
| 17 | Unless required by applicable law or agreed to in writing, |
| 18 | software distributed under the License is distributed on an |
| 19 | "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 20 | KIND, either express or implied. See the License for the |
| 21 | specific language governing permissions and limitations |
| 22 | under the License. |
| 23 | |
| 24 | -------------------------------------------------------------------- |
| Jens Geyer | 5767901 | 2016-09-21 22:18:44 +0200 | [diff] [blame] | 25 | --> |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 26 | |
| PoojaChandak | 20205b8 | 2020-11-06 11:33:40 +0100 | [diff] [blame] | 27 | This document describes the high-level message exchange between the Thrift RPC client and server. |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 28 | See [thrift-binary-protocol.md] and [thrift-compact-protocol.md] for a description of how the exchanges are encoded on |
| 29 | the wire. |
| 30 | |
| PoojaChandak | 20205b8 | 2020-11-06 11:33:40 +0100 | [diff] [blame] | 31 | In addition, this document compares the binary protocol with the compact protocol. Finally, it describes the framed vs. |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 32 | unframed transport. |
| 33 | |
| 34 | The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1 and |
| PoojaChandak | 20205b8 | 2020-11-06 11:33:40 +0100 | [diff] [blame] | 35 | 0.9.3). Other implementation, however, should behave the same. |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 36 | |
| 37 | For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). |
| 38 | |
| 39 | # Contents |
| 40 | |
| 41 | * Thrift Message exchange for Remote Procedure Call |
| 42 | * Message |
| 43 | * Request struct |
| 44 | * Response struct |
| 45 | * Protocol considerations |
| 46 | * Comparing binary and compact protocol |
| 47 | * Compatibility |
| 48 | * Framed vs unframed transport |
| 49 | |
| 50 | # Thrift Remote Procedure Call Message exchange |
| 51 | |
| 52 | Both the binary protocol and the compact protocol assume a transport layer that exposes a bi-directional byte stream, |
| 53 | for example a TCP socket. Both use the following exchange: |
| 54 | |
| 55 | 1. Client sends a `Message` (type `Call` or `Oneway`). The TMessage contains some metadata and the name of the method |
| 56 | to invoke. |
| 57 | 2. Client sends method arguments (a struct defined by the generate code). |
| 58 | 3. Server sends a `Message` (type `Reply` or `Exception`) to start the response. |
| 59 | 4. Server sends a struct containing the method result or exception. |
| 60 | |
| 61 | The pattern is a simple half duplex protocol where the parties alternate in sending a `Message` followed by a struct. |
| 62 | What these are is described below. |
| 63 | |
| 64 | Although the standard Apache Thrift Java clients do not support pipelining (sending multiple requests without waiting |
| 65 | for an response), the standard Apache Thrift Java servers do support it. |
| 66 | |
| 67 | ## Message |
| 68 | |
| 69 | A *Message* contains: |
| 70 | |
| 71 | * _Name_, a string (can be empty). |
| 72 | * _Message type_, a message types, one of `Call`, `Reply`, `Exception` and `Oneway`. |
| 73 | * _Sequence id_, a signed int32 integer. |
| 74 | |
| 75 | The *sequence id* is a simple message id assigned by the client. The server will use the same sequence id in the |
| 76 | message of the response. The client uses this number to detect out of order responses. Each client has an int32 field |
| 77 | which is increased for each message. The sequence id simply wraps around when it overflows. |
| 78 | |
| 79 | The *name* indicates the service method name to invoke. The server copies the name in the response message. |
| 80 | |
| 81 | When the *multiplexed protocol* is used, the name contains the service name, a colon `:` and the method name. The |
| 82 | multiplexed protocol is not compatible with other protocols. |
| 83 | |
| 84 | The *message type* indicates what kind of message is sent. Clients send requests with TMessages of type `Call` or |
| 85 | `Oneway` (step 1 in the protocol exchange). Servers send responses with messages of type `Exception` or `Reply` (step |
| 86 | 3). |
| 87 | |
| 88 | Type `Reply` is used when the service method completes normally. That is, it returns a value or it throws one of the |
| 89 | exceptions defined in the Thrift IDL file. |
| 90 | |
| 91 | Type `Exception` is used for other exceptions. That is: when the service method throws an exception that is not declared |
| 92 | in the Thrift IDL file, or some other part of the Thrift stack throws an exception. For example when the server could |
| 93 | not encode or decode a message or struct. |
| 94 | |
| 95 | In the Java implementation (0.9.3) there is different behavior for the synchronous and asynchronous server. In the async |
| PoojaChandak | 20205b8 | 2020-11-06 11:33:40 +0100 | [diff] [blame] | 96 | server all exceptions are sent as a `TApplicationException` (see 'Response struct' below). In the synchronous Java |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 97 | implementation only (undeclared) exceptions that extend `TException` are send as a `TApplicationException`. Unchecked |
| 98 | exceptions lead to an immediate close of the connection. |
| 99 | |
| 100 | Type `Oneway` is only used starting from Apache Thrift 0.9.3. Earlier versions do _not_ send TMessages of type `Oneway`, |
| 101 | even for service methods defined with the `oneway` modifier. |
| 102 | |
| PoojaChandak | 20205b8 | 2020-11-06 11:33:40 +0100 | [diff] [blame] | 103 | When the client sends a request with type `Oneway`, the server must _not_ send a response (steps 3 and 4 are skipped). Note |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 104 | that the Thrift IDL enforces a return type of `void` and does not allow exceptions for oneway services. |
| 105 | |
| 106 | ## Request struct |
| 107 | |
| 108 | The struct that follows the message of type `Call` or `Oneway` contains the arguments of the service method. The |
| 109 | argument ids correspond to the field ids. The name of the struct is the name of the method with `_args` appended. |
| 110 | For methods without arguments an struct is sent without fields. |
| 111 | |
| 112 | ## Response struct |
| 113 | |
| 114 | The struct that follows the message of type `Reply` are structs in which exactly 1 of the following fields is encoded: |
| 115 | |
| 116 | * A field with name `success` and id `0`, used in case the method completed normally. |
| 117 | * An exception field, name and id are as defined in the `throws` clause in the Thrift IDL's service method definition. |
| 118 | |
| 119 | When the message is of type `Exception` the struct is encoded as if it was declared by the following IDL: |
| 120 | |
| 121 | ``` |
| 122 | exception TApplicationException { |
| 123 | 1: string message, |
| 124 | 2: i32 type |
| 125 | } |
| 126 | ``` |
| 127 | |
| 128 | The following exception types are defined in the java implementation (0.9.3): |
| 129 | |
| 130 | * _unknown_: 0, used in case the type from the peer is unknown. |
| 131 | * _unknown method_: 1, used in case the method requested by the client is unknown by the server. |
| 132 | * _invalid message type_: 2, no usage was found. |
| 133 | * _wrong method name_: 3, no usage was found. |
| 134 | * _bad sequence id_: 4, used internally by the client to indicate a wrong sequence id in the response. |
| 135 | * _missing result_: 5, used internally by the client to indicate a response without any field (result nor exception). |
| 136 | * _internal error_: 6, used when the server throws an exception that is not declared in the Thrift IDL file. |
| 137 | * _protocol error_: 7, used when something goes wrong during decoding. For example when a list is too long or a required |
| 138 | field is missing. |
| 139 | * _invalid transform_: 8, no usage was found. |
| 140 | * _invalid protocol_: 9, no usage was found. |
| 141 | * _unsupported client type_: 10, no usage was found. |
| 142 | |
| 143 | # Protocol considerations |
| 144 | |
| 145 | ## Comparing binary and compact protocol |
| 146 | |
| 147 | The binary protocol is fairly simple and therefore easy to process. The compact protocol needs less bytes to send the |
| 148 | same data at the cost of additional processing. As bandwidth is usually the bottleneck, the compact protocol is almost |
| 149 | always slightly faster. |
| 150 | |
| 151 | ## Compatibility |
| 152 | |
| 153 | A server could automatically determine whether a client talks the binary protocol or the compact protocol by |
| Blacker1230 | 15cc0c4 | 2022-04-01 17:41:57 +0800 | [diff] [blame] | 154 | investigating the first byte. If the value is `1000 0000` or `0000 0000` (assuming a name shorter than ±16 MB) it is the |
| Erik van Oosten | 3f5fa5f | 2016-06-29 13:24:00 +0200 | [diff] [blame] | 155 | binary protocol. When the value is `1000 0010` it is talking the compact protocol. |
| 156 | |
| 157 | ## Framed vs. unframed transport |
| 158 | |
| 159 | The first thrift binary wire format was unframed. This means that information is sent out in a single stream of bytes. |
| 160 | With unframed transport the (generated) processors will read directly from the socket (though Apache Thrift does try to |
| 161 | grab all available bytes from the socket in a buffer when it can). |
| 162 | |
| 163 | Later, Thrift introduced the framed transport. |
| 164 | |
| 165 | With framed transport the full request and response (the TMessage and the following struct) are first written to a |
| 166 | buffer. Then when the struct is complete (transport method `flush` is hijacked for this), the length of the buffer is |
| 167 | written to the socket first, followed by the buffered bytes. The combination is called a _frame_. On the receiver side |
| 168 | the complete frame is first read in a buffer before the message is passed to a processor. |
| 169 | |
| 170 | The length prefix is a 4 byte signed int, send in network (big endian) order. |
| 171 | The following must be true: `0` <= length <= `16384000` (16M). |
| 172 | |
| 173 | Framed transport was introduced to ease the implementation of async processors. An async processor is only invoked when |
| 174 | all data is received. Unfortunately, framed transport is not ideal for large messages as the entire frame stays in |
| 175 | memory until the message has been processed. In addition, the java implementation merges the incoming data to a single, |
| 176 | growing byte array. Every time the byte array is full it needs to be copied to a new larger byte array. |
| 177 | |
| 178 | Framed and unframed transports are not compatible with each other. |