-- additions/fixes to thrift paper
Summary:
- fixed some typos and added a subsection on TFileTransport
Reviewed By: tbr-mcslee
git-svn-id: https://svn.apache.org/repos/asf/incubator/thrift/trunk@665070 13f79535-47bb-0310-9956-ffa450edef68
diff --git a/doc/thrift.tex b/doc/thrift.tex
index eb8d939..607901d 100644
--- a/doc/thrift.tex
+++ b/doc/thrift.tex
@@ -125,7 +125,7 @@
system.
\textit{Processors.} Finally, we generate code capable of processing data
-streams to accomplish remote procedure call. Section 6 details the generated
+streams to accomplish remote procedure calls. Section 6 details the generated
code and TProcessor paradigm.
Section 7 discusses implementation details, and Section 8 describes
@@ -181,7 +181,7 @@
an STL vector, Java ArrayList, or native array in scripting languages. May
contain duplicates.
\item \texttt{set<type>} An unordered set of unique elements. Translates into
-an STL set, Java HashSet, or native dictionary in PHP/Python/Ruby.
+an STL set, Java HashSet, or native dictionary in PHP/Python/Ruby.
\item \texttt{map<type1,type2>} A map of strictly unique keys to values
Translates into an STL map, Java HashMap, PHP associative array,
or Python/Ruby dictionary.
@@ -190,14 +190,14 @@
While defaults are provided, the type mappings are not explicitly fixed. Custom
code generator directives have been added to substitute custom types in
destination languages (i.e.
-\texttt{hash\_map}, or Google's sparse hash map can be used in C++). The
+\texttt{hash\_map} or Google's sparse hash map can be used in C++). The
only requirement is that the custom types support all the necessary iteration
primitives. Container elements may be of any valid Thrift type, including other
containers or structs.
\subsection{Structs}
-A Thrift struct defines a common objects to be used across languages. A struct
+A Thrift struct defines a common object to be used across languages. A struct
is essentially equivalent to a class in object oriented programming
languages. A struct has a set of strongly typed fields, each with a unique
name identifier. The basic syntax for defining a Thrift struct looks very
@@ -285,7 +285,7 @@
immaterial compared to the cost of actual I/O operations (typically invoking
system calls).
-Fundamentally, generated Thrift code just needs to know how to read and
+Fundamentally, generated Thrift code only needs to know how to read and
write data. Where the data is going is irrelevant, it may be a socket, a
segment of shared memory, or a file on the local disk. The Thrift transport
interface supports the following methods.
@@ -330,11 +330,9 @@
\subsubsection{TFileTransport}
The \texttt{TFileTransport} is an abstraction of an on-disk file to a data
-stream. It allows Thrift data structures to be used as historical log data.
-Essentially, an application developer can use a \texttt{TFileTransport} to
-write out a set of
-requests to a file on disk. Later, this data may be replayed from the log,
-either for post-processing or for recreation and simulation of previous events.
+stream. It can be used to write out a set of incoming thrift request to a file
+on disk. The on-disk data can then be replayed from the log, either for post-processing
+or for recreation and simulation of past events. \texttt(TFileTransport).
\subsubsection{Utilities}
@@ -427,7 +425,7 @@
atomic operation, then the implementation would require a linear pass over the
entire list before encoding any data. However, if the list can be written
as iteration is performed, the corresponding read may begin in parallel,
-theoretically offering an end-to-end speedup of $kN - C$, where $N$ is the size
+theoretically offering an end-to-end speedup of $(kN - C)$, where $N$ is the size
of the list, $k$ the cost factor associated with serializing a single
element, and $C$ is fixed offset for the delay between data being written
and becoming available to read.
@@ -806,6 +804,20 @@
each contain an instance of the other. (Since we do not allow \texttt{null}
struct instances in the generated C++ code, this would actually be impossible.)
+\subsection{TFileTransport}
+The \texttt{TFileTransport} logs thrift requests/structs by
+framing incoming data with its length and writing it to disk.
+Using a framed on-disk format allows for better error checking and
+helps with processing a finite number of discrete events. The
+\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers
+to ensure good performance while logging large amounts of data.
+A thrift logfile is split up into chunks of a speficified size and logged messages
+are not allowed to cross chunk boundaries. A message that would cross a chunk
+boundary will cause padding to be added until the end of the chunk and the
+first byte of the message is aligned to the beginning of the new chunk.
+Partitioning the file into chunks makes it possible to read and interpret data
+from a particular point in the file.
+
\section{Conclusions}
Thrift has enabled Facebook to build scalable backend
services efficiently by enabling engineers to divide and conquer. Application
@@ -841,7 +853,7 @@
\acks
Many thanks for feedback on Thrift (and extreme trial by fire) are due to
-Martin Smith, Karl Voskuil, and Yishan Wong.
+Martin Smith, Karl Voskuil and Yishan Wong.
Thrift is a successor to Pillar, a similar system developed
by Adam D'Angelo, first while at Caltech and continued later at Facebook.