-- additions/fixes to thrift paper

Summary:
- fixed some typos and added a subsection on TFileTransport

Reviewed By: tbr-mcslee


git-svn-id: https://svn.apache.org/repos/asf/incubator/thrift/trunk@665070 13f79535-47bb-0310-9956-ffa450edef68
diff --git a/doc/thrift.tex b/doc/thrift.tex
index eb8d939..607901d 100644
--- a/doc/thrift.tex
+++ b/doc/thrift.tex
@@ -125,7 +125,7 @@
 system.
 
 \textit{Processors.} Finally, we generate code capable of processing data
-streams to accomplish remote procedure call. Section 6 details the generated
+streams to accomplish remote procedure calls. Section 6 details the generated
 code and TProcessor paradigm.
 
 Section 7 discusses implementation details, and Section 8 describes
@@ -181,7 +181,7 @@
 an STL vector, Java ArrayList, or native array in scripting languages. May
 contain duplicates.
 \item \texttt{set<type>} An unordered set of unique elements. Translates into
-an STL set, Java HashSet, or native dictionary in PHP/Python/Ruby.
+an STL set, Java HashSet, or native dictionary in PHP/Python/Ruby. 
 \item \texttt{map<type1,type2>} A map of strictly unique keys to values
 Translates into an STL map, Java HashMap, PHP associative array,
 or Python/Ruby dictionary.
@@ -190,14 +190,14 @@
 While defaults are provided, the type mappings are not explicitly fixed. Custom
 code generator directives have been added to substitute custom types in
 destination languages (i.e.
-\texttt{hash\_map}, or Google's sparse hash map can be used in C++). The
+\texttt{hash\_map} or Google's sparse hash map can be used in C++). The
 only requirement is that the custom types support all the necessary iteration
 primitives. Container elements may be of any valid Thrift type, including other
 containers or structs.
 
 \subsection{Structs}
 
-A Thrift struct defines a common objects to be used across languages. A struct
+A Thrift struct defines a common object to be used across languages. A struct
 is essentially equivalent to a class in object oriented programming
 languages. A struct has a set of strongly typed fields, each with a unique
 name identifier. The basic syntax for defining a Thrift struct looks very
@@ -285,7 +285,7 @@
 immaterial compared to the cost of actual I/O operations (typically invoking
 system calls).
 
-Fundamentally, generated Thrift code just needs to know how to read and
+Fundamentally, generated Thrift code only needs to know how to read and
 write data. Where the data is going is irrelevant, it may be a socket, a
 segment of shared memory, or a file on the local disk. The Thrift transport
 interface supports the following methods.
@@ -330,11 +330,9 @@
 \subsubsection{TFileTransport}
 
 The \texttt{TFileTransport} is an abstraction of an on-disk file to a data
-stream. It allows Thrift data structures to be used as historical log data.
-Essentially, an application developer can use a \texttt{TFileTransport} to
-write out a set of
-requests to a file on disk. Later, this data may be replayed from the log,
-either for post-processing or for recreation and simulation of previous events.
+stream. It can be used to write out a set of incoming thrift request to a file
+on disk. The on-disk data can then be replayed from the log, either for post-processing
+or for recreation and simulation of past events. \texttt(TFileTransport).
 
 \subsubsection{Utilities}
 
@@ -427,7 +425,7 @@
 atomic operation, then the implementation would require a linear pass over the
 entire list before encoding any data. However, if the list can be written
 as iteration is performed, the corresponding read may begin in parallel,
-theoretically offering an end-to-end speedup of $kN - C$, where $N$ is the size
+theoretically offering an end-to-end speedup of $(kN - C)$, where $N$ is the size
 of the list, $k$ the cost factor associated with serializing a single
 element, and $C$ is fixed offset for the delay between data being written
 and becoming available to read.
@@ -806,6 +804,20 @@
 each contain an instance of the other. (Since we do not allow \texttt{null}
 struct instances in the generated C++ code, this would actually be impossible.)
 
+\subsection{TFileTransport}
+The \texttt{TFileTransport} logs thrift requests/structs by 
+framing incoming data with its length and writing it to disk. 
+Using a framed on-disk format allows for better error checking and 
+helps with processing a finite number of discrete events. The 
+\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers 
+to ensure good performance while logging large amounts of data. 
+A thrift logfile is split up into chunks of a speficified size and logged messages
+are not allowed to cross chunk boundaries. A message that would cross a chunk 
+boundary will cause padding to be added until the end of the chunk and the 
+first byte of the message is aligned to the beginning of the new chunk.
+Partitioning the file into chunks makes it possible to read and interpret data 
+from a particular point in  the file. 
+
 \section{Conclusions}
 Thrift has enabled Facebook to build scalable backend
 services efficiently by enabling engineers to divide and conquer. Application
@@ -841,7 +853,7 @@
 \acks
 
 Many thanks for feedback on Thrift (and extreme trial by fire) are due to
-Martin Smith, Karl Voskuil, and Yishan Wong.
+Martin Smith, Karl Voskuil and Yishan Wong.
 
 Thrift is a successor to Pillar, a similar system developed
 by Adam D'Angelo, first while at Caltech and continued later at Facebook.