What is TCP?
| |
Transmission Control Protocol |
|
TCP is one of the main protocols in TCP/IP networks. Whereas the IP protocol deals only with packets, TCP enables two hosts to establish a connection
and exchange streams of data. TCP guarantees
delivery of data and also guarantees that
packets will be delivered in the same order in
which they were sent.
TCP stands for Transmission Control Protocol.
It is described in STD-7/RFC-793. TCP is a
connection-oriented protocol that is responsible
for reliable communication between two end
processes. The unit of data transferred is
called a stream, which is simply a sequence of
bytes.
Being connection-oriented means that before
actually transmitting data, you must open the
connection between the two end points. The data
can be transferred in full duplex (send and
receive on a single connection). When the
transfer is done, you have to close the
connection to free system resources. Both ends
know when the session is opened (begin) and is
closed (end). The data transfer cannot take
place before both ends have agreed upon the
connection. The connection can be closed by
either side; the other is notified. Provision is
made to close gracefully or just abort the
connection.
Being stream oriented means that the data is
an anonymous sequence of bytes. There is nothing
to make data boundaries apparent. The receiver
has no means of knowing how the data was
actually transmitted. The sender can send many
small data chunks and the receiver receive only
one big chunk, or the sender can send a big
chunk, the receiver receiving it in a number of
smaller chunks. The only thing that is
guaranteed is that all data sent will be
received without any error and in the correct
order. Should any error occur, it will
automatically be corrected (retransmitted as
needed) or the error will be notified if it
can't be corrected.
At the program level, the TCP stream look
like a flat file. When you write data to a flat
file, and read it back later, you are absolutely
unable to know if the data has been written in
only one chunk or in several chunks. Unless you
write something special to identify record
boundaries, there is nothing you can do to learn
it afterward. You can, for example, use CR or CR
LF to delimit your records just like a flat text
file.
At the programming level, TWSocket is fairly
simple to use. To send data, you just need to
call the Send method (or any variation such as
SendStr) to give the data to be transmitted.
TWSocket will put it in a buffer until it can be
actually transmitted. Eventually the data will
be sent in the background (the Send method
returns immediately without waiting for the data
to be transmitted) and the OnDataSent event will
be generated once the buffer is emptied.
To receive data, a program must wait until it
receives the OnDataAvailable event. This event
is triggered each time a data packet comes from
the lower level. The application must call the
Receive method to actually get the data from the
low-level buffers. You have to Receive all the
data available or your program will go in an
endless loop because TWSocket will trigger the
OnDataAvailable again if you didn't Receive all
the data.
As the data is a stream of bytes, your
application must be prepared to receive data as
sent from the sender, fragmented in several
chunks or merged in bigger chunks. For example,
if the sender sent "Hello " and then "World!",
it is possible to get only one OnDataAvailable
event and receive "Hello World!" in one chunk,
or to get two events, one for "Hello " and the
other for "World!". You can even receive more
smaller chunks like "Hel", "lo wo" and "rld!".
What happens depends on traffic load, router
algorithms, random errors and many other
parameters you can't control.
On the subject of client/server applications,
most applications need to know command
boundaries before being able to process data. As
data boundaries are not always preserved, you
cannot suppose your server will receive a single
complete command in one OnDataAvailable event.
You can receive only part of a request or maybe
two or more request merged in one chunk. To
overcome this difficulty, you must use
delimiters.
Most TCP/IP protocols, like SMTP, POP3, FTP
and others, use CR/LF pair as command delimiter.
Each client request is sent as is with a CR/LF
pair appended. The server receives the data as
it arrives, assembles it in a receive buffer,
scans for CR/LF pairs to extract commands from
the received stream, and removes them from the
receive buffer.
Implementations used cumulative positive
acknowledgements and the expiry of a retransmit
timer to provide reliability based on a simple
go-back-n model.
TCP Tahoe which is the base
standard of modern TCP implementations, added
end system based traffic control functions. New
algorithms and refinements were designed to
address congestion issues in a network and their
adoption was partially influenced by the threat
of congestion collapse in the public Internet.
The new algorithms included:
– Slow-Start
– Congestion Avoidance
– Fast Retransmit
TCP Reno
this implementation, in
addition to the enhancements incorporated into
Tahoe,includes the Fast Recovery algorithm. This
is an extension of the Fast Retransmit
algorithm, which optimises Reno for the case
when a single packet is dropped from a window of
data. Reno also supports the use of Delayed
Acknowledgement, which, instead of acknowledging
every packet, acknowledges every other packet.
In many implementations however, this
feature is turned off.
TCP Vegas
TCP Vegas takes a different approach and uses
the measured Round Trip Time (RTT) to accurately
calculate the amount of data packets that the
sender can send to avoid packet losses. Vegas
modifies Reno’s Congestion Avoidance algorithm
as well as the Slow Start mechanism. Related
work, supported by our results here, has found
that TCP Vegas suffers a serious disadvantage
when in competition with TCP Reno, unless
parameter adjustment is made. Vegas is not
widely implemented and is generally regarded as
being an experimental transport protocol.
TCP Sack
TCP Sack uses the TCP Reno algorithms and adds
the feature of Selective Acknowledgement. The
Selective Acknowledgement strategy was proposed
to address the problems presented by multiple
packet loss from a window of data. Essentially
the Sack version of TCP provides a mechanism
whereby the data receiver can inform the sender
of all packets successfully received rather than
simply acknowledging the last in-order packet
correctly received. This prevents the
retransmission of packets that have been
successfully received but not acknowledged.
· the operating system does all the work. you just sit back and
watch the show. no need to have the same bugs in your code that
everyone else did on their first try; it's all been figured out for
you.
· since it's in the os, handling incoming packets has fewer context
switches from kernel to user space and back; all the reassembly,
acking, flow control, etc is done by the kernel.
· tcp guarantees three things: that your data gets there, that it
gets there in order, and that it gets there without duplication. (the
truth, the whole truth, and nothing but the truth...)
· routers may notice tcp packets and treat them specially. they can
buffer and retransmit them, and in limited cases preack them.
· tcp has good relative throughput on a modem or a lan.
· the operating system may be buggy, and you can't escape it. it
may be inefficient, and you have to put up with it. it may be
optimized for conditions other than the ones you are facing, and you
may not be able to retune it.
· tcp makes it very difficult to try harder; you can set a few
socket options, but beyond that you have to tolerate the built in flow
control.
· tcp may have lots of features you don't need. it may waste
bandwidth, time, or effort on ensuring things that are irrelevant to
the task at hand.
· tcp has no block boundaries; you must create your own.
· routers on the internet today are out of memory. they can't pay
much attention to tcp flying by, and try to help it. design
assumptions of tcp break down in this environment.
· tcp has relatively poor throughput on a lossy, high bandwidth,
high latency link, such as a satellite connection or an overfull t1.
· tcp cannot be used for broadcast or multicast transmission.
· tcp cannot conclude a transmission without all data in motion
being explicitly acked.
| |
Disadvantages of TCP for file transfer
|
|
· startup latency is significant. it takes at least twice rtt to
start getting data back.
· tcp allows a window of at most 64k, and the acking mechanism
means that packet loss is misdetected. tcp stalls easily under packet
loss. tcp is more throttled by rtt than bandwidth.
· tcp transfer servers have to maintain a separate socket (and
often separate thread) for each client.
· load balancing is crude and approximate. especially on local
networks that allow collisions, two simultaneous tcp transfers have a
tendency to fight with each other, even if the sender is the same.
|