An Analysis on Packet Reordering and Fast Retransmit schemes for TCP

(1)

International Journal of Electrical, Electronics and Computer Systems (IJEECS)

________________________________________________________________________________________________

An Analysis on Packet Reordering and Fast Retransmit schemes for TCP

1A.Radha Rani, ²K.LakshmiNadh, ³S. SivanageswaraRao

1Department of Computer Science and Engineering, Malla Reddy Engineering College For Women, Hyderabad, India

2Department of Computer Science and Engineering, Narasaraopet Engineering College, Narasaraopet, India

3Department of Computer Science and Engineering, Narasaraopet Engineering College, Narasaraopet, India

Abstract: Although there are two standard transport protocols, TCP and UDP, offering services in the Internet, the majority of the traffic over the Internet is TCP-based.

TCP performs poorly on paths that reorder packets significantly, where it misinterprets out-of-order delivery as packet loss. The sender responds with a fast retransmit even though actual loss is not occurred. TCP-based applications are prone to packet losses, however, many performance problems have been recently observed in the Internet. To resolve these problems, several new TCP fast retransmit and fast recovery algorithms have been proposed. This article surveys on fast retransmit and fast recovery mechanisms of TCP to address the lost packet problem, and present a description of some useful algorithms, design issues, advantages, and disadvantages.

The objective of this article is to provide causes for packet reordering and to describe existing fast retransmit and fast recovery schemes of transport protocols.

I. INTRODUCTION

Transmission Control Protocol (TCP) is the most important transport layer protocol over current networks, providing connection-oriented end-to-end in-order data delivery services to various applications. However, while network congestion constitute the dominant cause of out-of-order packet events over conventional wired networks, wireless networks have often seen random packet loss and packet reordering as other predominant sources of such events. When compared with wired media, the wireless medium provides much more lossy physical links for data transmission. Signals propagating over wireless channels suffer severely from degradation, interference, and noise. Packets received may be damaged to an extent beyond the recovery capability of error correcting codes, if any. These packets are thus discarded, leading to random packet losses. Packet reordering in the Internet is a well-known phenomenon [1]. Packets may be reordered as they traverse different paths in the network or by the networking gear itself. As measurements of Internet traffic suggest [2], the majority of the traffic sent over the Internet is transported by TCP. Although TCP is a reliable transport mechanism, dropped packets and packet reordering can affect its performance, and hence the end

application throughput will be effected. Even in networks with a large number of flows, packet reordering causes less throughput and response time for some applications.TCP implementations today use fast recovery [3, 4] and fast retransmission [3, 4] algorithms when responding to dropped packets and packet reordering.

The smallest unit of data transmitted in the Internet is a data segment or packet, each identified by a data octet number. When a destination receives a data segment, it acknowledges the receipt of the segment by issuing an acknowledgement (ACK) with the next expected data octet number. The time elapsed between when a data segment is sent and when an ACK for the segment is received is known as the round-trip time (RTT) of the communication between the source and the destination, which is the sum of the propagation, transmission, queueing, and processing delays at each hop of the communication, and the time taken to process a received segment and generate an ACK for the segment at the destination.

When the data octet number of an arriving segment is greater than the expected one, the destination finds a gap in the sequence number space (known as a sequence hole) and thus immediately sends out a duplicate ACK, i.e., an ACK with the same next expected data octet number in the cumulative acknowledgement field to the source. If the communication channel is an in-order channel, the reception of a duplicate ACK implies the loss of a segment. When the source receives three duplicate ACKs, fast retransmit is triggered such that the inferred loss segment is retransmitted before the retransmission timer expires.

Fast recovery works as a companion of fast retransmit.

A fast retransmission suggests the presence of mild network congestion. The slow start threshold (ssthresh) is set to half of the amount of outstanding data sent to the network. Since the reception of a duplicate ACK indicates the segment departure from the network, congestion window size (cwnd) is set to the sum of ssthresh and the number of duplicate ACKs received.

(2)

When an ACK for a new segment arrives, cwnd is reset to ssthresh and then congestion is avoided.

Packet reordering refers to receiving order of a flow of packets (or segments) differs from its sending order. The presence of persistent and substantial packet reordering violates the in-order or near in-order channel assumption made in the design principles of some traffic control mechanisms in TCP. This can result in a substantial degradation in application throughput and network performance [6].

The objective of this paper is to present to the readers a clear overview of TCP for packet reordering. The rest of the paper is organized as follows: In Section 2, we characterize the causes of packet reordering. Section 3, identifies the problems packet reordering introduces on TCP. Section 4, we conduct a survey of existing solutions to packet reordering. Sections 5 describe the selection of the fast retransmit and fast recovery mechanisms of existing transport protocols ,classify them and different approaches are compared. Finally, we summarize and conclude the paper in Section 6.

II. CAUSES OF PACKET REORDERING

Packet reordering is a phenomenon in which packets with higher sequence numbers are received earlier than those with smaller sequence numbers. Fig.1.shows the TCP sender sequentially sends eight packets: P1, P2, P3, P4, P5, P6, P7 and P8 without reordering. In Fig.2. the same sequence arrive at the TCP receiver in the following order: P1, P3, P4, P5, P6, P2, P7 and P8, where P2 is reordered with reordering block size = 1 and reordering delay time = 4 packets.

Fig. 1. Original packet sequence without reordering

Fig. 2. New packet sequence after packet 2 is reordered with reordering block size = 1 and reordering delay time

= 4 packets

Packet reordering can be caused by networks due to the following reasons: packet level multipath routing, route fluttering, inherent parallelism in modern high-speed routers, link-layer retransmissions, and router forwarding lulls.

Packet-Level Multipath Routing: Multipath routing [6], [7] is used to spread the traffic load across the network in order to alleviate network congestion. It has been shown [7], [8] that multipath routing balances the load significantly better than singlepath routing and provides better performance in congestion and capacity over wired/wireless networks. Packet-level multipath routing allows packets of the same traffic flow to be forwarded over multiple routes to a destination so as to achieve load balancing in packet-switching networks.

This functionality is supported by overlay networks.

However, these packets may be reordered on arrival at the destination due to the differences in path delays.

Route Fluttering: Routing fluttering is a network phenomenon in which the forwarding path to a certain destination oscillates among a set of available routes to that destination. These results form route instability due to shaky links, and heavy loads or bursty traffic where the link cost metrics used in the routing algorithms are related to delays or congestion experienced over the network links. This also results in topological changes in the wireless environment. For example, mobile ad hoc networks have no fixed infrastructure and every mobile node can be a source, a destination, or a router. Similar to packet-level multipath routing, route fluttering causes packets to be forwarded on different paths and arrive at a destination out of order.

Inherent Parallelism in Modern High-Speed Routers:

Modern routers support packet striping so that packets of the same traffic flow can be forwarded over lower- capacity, but much cheaper multiple parallel links connecting to the next-hop router for that flow. To switch packets at high speed, this router is generally work conserving so that its outgoing ports connecting to a certain next-hop router are idle only when there is no outstanding packets to be forwarded to that router. Since packets may be of different sizes and the links can be of different bandwidths, packets may take dramatically different times to transmit, and hence arrive at the neighboring router in a different order from they are sent.

Link-Layer Retransmissions: Link-layer retransmission mechanisms have been proposed to efficiently recover transmission losses due to high channel error rates in wireless networks. Such retransmitted packets are sent only after the losses are detected. These packets may then be interspersed with other packets belonging to the same traffic flow.

(3)

Router Forwarding Lulls: Some routers can pause its forwarding activity for buffered packets when it processes a routing update. These buffered packets are interspersed with new arrivals, thus causing packet reordering [10].

III. IMPACT OF PACKET REORDERING ON TCP

TCP relies on the use of a cumulative ACK to announce the receipt of segment(s). The pace at which a source receives ACKs drives how fast it can inject TCP segments into the network and its associated destination.

With persistent and substantial packet reordering, TCP spuriously retransmits segments, keeps its congestion window unnecessarily small, loses its ACK-clocking, and understates the estimated RTT (and, thus, RTO) [5].

These will be described in detail next:

Spurious Segment Retransmissions: Packet reordering causes the starting data octet number of some arriving segments to differ from the ones expected by a destination. In other words, the destination finds a sequence hole upon segment reception. It then generates duplicate ACKs and sends them to its associated source.

When the source receives three such duplicate ACKs consecutively, an inferred loss segment is retransmitted.

Persistent and substantial packet reordering often causes some TCP segments to be retransmitted spuriously and unnecessarily [19].

Keeping Congestion Window Unnecessarily Small:

Fast recovery is always triggered with fast retransmit. A spurious fast retransmission not only generates additional yet unnecessary workload to the network and a destination, but also halves the congestion window.

Thus, the congestion window is kept small relative to the available bandwidth of its transmission path, with persistent and substantial packet reordering.

Loss of ACK-Clocking: Packet reordering causes not only data segments, but also ACKs to arrive at a destination out of order. The former phenomenon is called forward-path reordering, while the latter is known as reverse-path reordering [5]. An illustration of forward-path reordering and reverse-path reordering is shown in Fig. 3. Suppose segments are sent from the source in the order S1, S2, S3, but Segment S1 arrives after Segment S2 at the destination. This represents a forward-path reordering. ACK A1 arrives after ACKs A2 and A3 at the source. This depicts a reverse-path reordering.

Fig. 3. An illustration of forward-path reordering and reverse-path reordering.

ACK-clocking or self clocking refers to the property that the receiver can generate ACKs no faster than data segments can get through the network [14]. For forward- path reordering, an ACK for several new segments, which follows a number of duplicate ACKs, can in turn allow a source to inject several pending segments into the networks. Even when there is no data segment being reordered, disordered ACKs lead to a source transmitting several segments together rather than one or two segments per ACK. This causes loss of its ACKclocking and far more bursty traffic, which may lead to transient network congestion and congestion collapse from undelivered packets [19].

Understating Estimated RTT and RTO: Whenever a segment is retransmitted, a source cannot determine whether a received ACK is for the first transmission or the retransmission of the segment. Karn‟s algorithm [15]

alleviates the problem by discarding all measured RTT samples until an ACK acknowledges a segment that has not been retransmitted. Since a fast retransmission is likely to correspond to a segment that experiences a longer path delay, the use of Karn‟s algorithm results in a sampling bias against long RTT samples [16]. With persistent and substantial packet reordering, these samples would be discarded. The estimated RTT and RTO are therefore understated.

IV. SOLUTIONS FOR PACKET REORDERING

The solutions for packet reordering include RR-TCP [9], TCP-DCR [8], TCP-DOOR [12], TCP-Eifel [11], and TCPPR [10] [18].

RR-TCP

Zhang et al. devised the reordering-robustTCP (RR- TCP) [9] as an extension of the Blanton-Allman algorithms [15], but they distinguish in three ways. First, RR-TCP uses a different mechanism to adjust dupthresh dynamically. The authors formulated a combined cost function for retransmission timeouts, spurious fast retransmissions, and limited transmit to adapt the false fast retransmit avoidance ratio (FA radio). The FA ratio, which represents the portion of reordering events to be avoided in order to minimize the cost, can then be used to find the corresponding dupthresh. Thus, this provides a mechanism to raise or reduce dupthresh dynamically, by changing the FA ratio based on the current network conditions. Second, the authors considered another extended version of the limited transmit algorithm. This extension permits a source to send up to one ACK- clocked additional congestion window‟s worth of data.

Third, the authors suggested an idea to correct the sampling bias against long RTT samples for the RTT and RTO estimations. Instead of skipping the samples for retransmitted segments in the Karn‟s algorithm, an RTT sample is taken for each retransmitted segment by taking it as the average of the RTTs for both the first and the second transmissions of that segment.

(4)

The simulation results in [16] showed that RR-TCP could significantly improve TCP performance over reordering networks. When 1-2 % of segments were randomly selected to experience a longer delay (according to a normal distribution), RR-TCP could improve the connection throughput by more than 50 percent and 150 percent when compared with the Blanton-Allman algorithms [15] (including the timedelayed fast retransmit algorithm) and SACK TCP, respectively. However, RR-TCP needs to maintain a reordering histogram to store the reordering information.

It is also required to scan and update the histogram for every reordered segment.

TCP-DCR

Bhandarkar and Reddy devised the delayed congestion response TCP (TCP-DCR) [8] to meliorate the TCP robustness to noncongestion events. It advances the timedelayed fast retransmit algorithm by delaying a congestion response for a time interval after the first duplicate ACK is received. The authors suggested to set this interval to one RTT so as to have sufficient time to deal with forward-path reordering due to link-layer retransmissions for loss recovery. To maintain ACK- clocking, TCPDCR sends one new data segment upon the receipt of each duplicate ACK.

The simulation results in [17] demonstrated that TCPDCR performed significantly better than SACK TCP. TCP-DCR connection throughput hiked 10 times than SACK TCP when more than 5 percent of packets are delayed according to a normal distribution with negligible congestion loss. However, the chosen bottleneck link delay is at least equal to the highest possible reordered delay for their experiments. This implies that a reordering event is unlikely to last longer than the interval for delaying the congestion response.

The suggested interval may not be a proper choice for multipath routing since packets are reordered mainly based on the differences in path delay, while the estimated RTT is a weighted average of RTT based on the traffic distribution to the participating paths. Further study is needed to find a proper choice of the delayed interval for congestion response with the presence of packet reordering.

Eifel Algorithm

Ludwig and Katz proposed the Eifel algorithm [11] to eliminate the retransmission ambiguity and solve the performance problems caused by spurious retransmissions. A source uses the TCP timestamp option to insert the current timestamp into the header of each outgoing segment to a destination. When the destination sends ACKs, it includes the corresponding timestamps into the ACKs. To eliminate the retransmission ambiguity, the source always stores the timestamp of the first retransmitted segment. When the first ACK for the retransmitted segment arrives, the source compares the timestamp of that ACK with the stored timestamp. If the stored timestamp is greater, the retransmission is considered spurious.

Fig. 4. An illustration of the Eifel algorithm.

Fig. 4 illustrates how the Eifel algorithm works. When the source sends Segment S the first time at Time T1, it inserts the current timestamp T1 into the header of the segment. At Time T2, the source initiates a congestion response by retransmitting Segment S. The original segment differs with the retransmitted one as the latter one contains a timestamp T2 instead of T1. When the destination receives the original Segment S first, it sends an ACK with the timestamp of the segment, i.e., T1.

When the ACK for the segment arrives, the source finds that the echoed timestamp, T1, is smaller than the stored one, T2. The retransmission is hence identified as spurious.

To solve the problems caused by spurious retransmissions, a source also stores the current values of the ssthresh, and the size of the congestion window, cwnd, when a segment is retransmitted the first time.

When a detected spurious retransmission has resulted in a single retransmission of the oldest outstanding segment, the source restores ssthresh and cwnd to the stored values. This technique is simple and effective in improving TCP performance with forwardpath reordering. However, bursts of TCP segments may be injected into the network when the state is restored.

Besides, the scheme does not work when the original and retransmitted segments are reordered.

TCP-DOOR

Wang and Zhang developed TCP with detection of out- oforder and response (TCP-DOOR) [12], which can be considered as an extension of the Eifel Algorithm. The out-of-order events imply route changes in the networks, which happen frequently in mobile ad hoc networks.

The TCP packet sequence number and ACK duplication sequence number, or current timestamps, are inserted into each data and ACK segment, respectively, to detect reordered data and ACK packets. When out-of-order events are detected, a source can either temporarily disable congestion control or perform recovery during congestion avoidance. By temporarily disabling congestion control, the source will maintain its state variable constant for a time period, say t1 seconds, after detecting an out-of-order event. By instant recovery during congestion avoidance, the source recovers immediately to the state before the congestion response, which has been invoked within t2 seconds ago.

However, TCP-DOOR does not distinguish between forward-path reordering or reverse-path reordering. The responses are suitable to alleviate some performance

(5)

problems caused by forward-path reordering. They do not help reduce bursty traffic, and in fact exaggerate network congestion under reverse-path reordering.

Besides, TCPDOOR does not perform well in a congested network environment with substantial persistent packet reordering. It disables congestion control for a time period every time an out-of-order event is detected, which may lead to congestion collapse from undelivered packets.

TCP-PR

Bohacek et al. proposed TCP for persistent packet reordering (TCP-PR) [10] to devise the RTO timer to enhance TCP performance for persistent packet reordering. Instead of keeping track of the EWMA of the mean RTT, TCP-PR utilized a nonsmoothed, exponentially weighted maximum possible RTT. By doing so, spikes in RTT can be promptly reflected in the estimated RTT for some time. When a segment drop is detected, cwnd is set to half of cwnd at the time the segment has been sent. Congestion is then avoided.

Subsequent occasional segment drops detected in the same congestion window will not cause any further reduction of cwnd to avoid over-reaction to congestion.

When more than half of a congestion window‟s segments lost, cwnd is set to one and the slow start process is performed.

The major advantage of TCP-PR is that the new RTT and RTO estimators are very effective in shielding the effect of packet reordering due to differences in path delays, since they are devised from the sampled maximum possible RTT. Another merit for TCP-PR is that it is able to maintain ACKclocking with the presence of packet reordering.

There are two limitations for TCP-PR.

1) To maintain a constant scaling of the RTT spikes per RTT, the scaling factor is raised to the factor of 1/cwnd.

However, this makes TCP-PR computationally expensive since a series of exponentiation computations have to be performed on every ACK arrival.

2) The proposed RTT estimator may be overly sensitive to spikes in RTT. A sudden high RTT sample, which can be caused by routing loops due to topological changes, can greatly enlarge the estimated RTT for some time.

V. FAST RETRANSMIT AND FAST RECOVERY SCHEMES

Any out-of-order packet arriving to the receiver will trigger a duplicate ACK in which the receiver repeats the ACK of the last in-order segment received (hence a duplicate ACK is sent). Any additional out-of-order segment will cause another duplicate ACK. Packet reordering may result in one or two duplicate ACKs.

The sender, however, considers three duplicate ACKs to be an indication that the packet was actually dropped and consequently triggers a retransmit of the missing segment. The retransmit occurs without waiting for a

timeout. This behavior is defined as fast retransmit [4].

The number of duplicate ACKs that trigger a fast retransmit is defined to be three [3]. This is the default value in all implementations that supports fast retransmit. Once retransmit takes place the sender enters fast recovery. Fast recovery differs from congestion avoidance in the behavior of the sender during the period between the retransmit of the missing packet and the time a new segment is ACKed.

This section surveys a selection of the fast retransmit and fast recovery mechanisms of existing transport protocols and classifies them as: TCP Tahoe [13], TCP Reno, TCP Vegas, TCP NewReno [14], SACK, TCP- Jersey, and TCP Westwood. Moreover, Fig. 5 shows the relationships between these protocols,

Fig.5. Relationship between transport protocols.

TCP Tahoe

Jacobson assumed that losses due to packet corruption are much less probable than losses due to buffer overflows on the network. Therefore, when a loss occurs, the sender should lower its share of the bandwidth. This is done by reducing its CWND to half the size at which the loss was found. In addition, the reasoning behind this value of one-half is that a decrease in the throughput should be equal to the multiplicative increase in the queue length in the network upon congestion.

The implementation of this multiplicative decrease is through the SSTHRESH. When a loss occurs, half the value of the CWND just before the loss is recorded in the SSTHRESH. The connection then resorts to slow start by setting the CWND to 1 packet. Slow start increases the CWND exponentially until it reaches the SSTHRESH from which it will perform an additive increase and multiplicative decrease (AIMD) the same thing happens again or the connection is terminated.

In order to identify that a packet is lost, Tahoe times the delay of the packet – from the sender putting a packet into the network to the time at which Tahoe receives the ACK for that packet. This value is known as the RTT.

From this value (and the aggregation of timed pairs),

(6)

Tahoe uses an RTO to see if there is a packet loss. If an ACK is not received before this RTO, then the sender would be confident that the packet is lost and that it should resend the packet to enable reliable delivery and movement of the window.

Another way of detecting a loss in TCP Tahoe is through the use of dupacks. Dupacks are similar to normal ACKs because they acknowledge the packets as well as tell the sender that the receiver is expecting to receive the next packet. However, the difference between a dupack and a normal ACK is that while a normal ACK acknowledges, one or more previously unacknowledged packets, a dupack re-acknowledges the same packet as the previous ACK. An example of this is, if a packet was lost, all packets after the lost packets were received. Consider that a packet with the sequence number n was lost, and so the receiver could not send an ACK for it to the sender. When packet n + 1 arrived at the destination, the receiver told the sender that it is still expecting packet n by sending an ACK with the number n. Similarly, the receiver upon receiving packet n + 2, would send another ACK saying that it is still expecting packet n. These two ACKs for packet n + 1 and packet n + 2 are known as dupacks. Furthermore, a dupack does not provide the sender any new information except that the receiver is still awaiting packet n.

Typically, Tahoe waits for three dupacks before inferring this loss by the RTO, and hence immediately resends the packet. In other words, Tahoe assumes that the receipt of three dupacks actually indicates loss and then quickly retransmits the lost packet without waiting for the RTO to expire. This occurs because the RTO is relatively quite long, and it could stall the TCP transfer.

This mechanism is called fast retransmit, which resends the lost packet when receiving three dupacks, sets the SSTHRESH to half the CWND, and then sets the CWND to one segment. In addition, this forces TCP to enter slow start again. TCP Tahoe can find a lost packet and retransmit it in as short a time as possible; however, it does not deal well with multiple packet drops within a single window of data.

TCP Reno

TCP Reno [13] introduced major improvements over TCP Tahoe by changing the way in which it reacts to detecting a loss through dupacks. There are two ways in which TCP Reno detects packet loss. One is based on the reception of three dupacks, and the other is based on RTO. When a source receives three dupacks, the fast retransmit and fast recovery algorithms are performed.

The source then immediately retransmits only the packet that is supposed to be lost but not subsequent ones, without waiting for a retransmission timer (also called a coarse–grained timer) to expire (the fast retransmit mechanism), and then the fast recovery mechanism is performed.

(1) The slow start threshold is set to one-half the current window size.

(2) The congestion window is set to the slow start threshold plus three times the packet size.

(3) Each time the sender receives a dupack, it increments the congestion window by one packet and sends a new packet.

(4) When the first non-duplicated ACK arrives, the congestion window is set to the slow start threshold.

If a serious congestion occurs and there are insufficient surviving packets to trigger three dupacks, the congestion will be detected by a coarse–grained retransmission timeout. When the retransmission timer expires, the slow start threshold is set to half the current congestion window size and then the congestion window size is reset to one; finally, the source restarts from the slow start phase.

The fundamental problem here is that fast retransmit algorithm assumes that only one packet was lost. This may result in loss of ACK clocking and timeouts if more than one packet are lost. Moreover, Reno encounters several problems with multiple packet losses in a window of data (usually of the order of half a window).

This usually happens when invoking fast retransmit and fast recovery. Reno invokes them several times in succession, leading to multiplicative decreases in the CWND and SSTHRESH. This impacts the throughput of the connection. Further, ACK starvation may occur because of the ambiguity of dupacks and the dynamics of the CWND. This is because the sender reduces the CWND when it enters fast retransmit. The sender then receives dupacks that inflate the CWND, requiring it to send new packets until it fills its sending window in fast recovery. It then receives a non-dupack and exits fast recovery.

However, due to multiple losses in the past, this ACK will be followed by three dupacks signaling that another packet was lost; consequently, fast retransmit is entered once again after another reduction in the SSTHRESH and CWND. As this happens several times in succession, the left edge of the sending window advances only after each successive fast retransmit and the amount of data in flight (sent but not yet ACKed) eventually becomes more than the congestion window (halved by the latest invocation of fast retransmit). As there are no more ACKs to receive, the sender stalls and recovers from this deadlock only through a timeout, which causes a slow start.

But when multiple packets are dropped from a window of data, TCP Reno may suffer serious performance problems. This is because it retransmits at most one dropped packet per round-trip time and further, the congestion window size may be decreased more than once due to multiple packet losses occurring during one round-trip time interval.

TCP NewReno

The NewReno[14] modification concerns the Fast Recovery procedure that begins when three duplicate

(7)

ACKs are received and ends when either a retransmission timeout occurs or an ACK arrives that acknowledges all of the data up to and including the data that was outstanding when the Fast Recovery procedure began.

The algorithm specified in this document uses a variable

"recover", whose initial value is the initial sender sequence number.

1) Three duplicate ACKs: When the third duplicate ACK is received and the sender is not already in the Fast Recovery procedure, it check to see if the Cumulative ACK field covers more than "recover". If so, go to Step 1A. Otherwise, go to Step 1B.

1A) Invoking Fast Retransmit: If so, then set ssthresh to no more than the value given in equation 1 below.

ssthresh = max (FlightSize / 2, 2*SMSS) (1)

In addition, record the highest sequence number transmitted in the variable "recover", and go to Step 2.

1B) Not invoking Fast Retransmit: Do not enter the Fast Retransmit and Fast Recovery

procedure. In particular, do not change ssthresh, do not go to Step 2 to retransmit the "lost" segment, and do not execute Step 3 upon subsequent duplicate ACKs.

2) Entering Fast Retransmit: Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS. This artificially "inflates" the congestion window by the number of segments (three) that have left the network and the receiver has buffered.

3) Fast Recovery: For each additional duplicate ACK received while in Fast Recovery, increment cwnd by SMSS. This artificially inflates the congestion window in order to reflect the additional segment that has left the network.

4) Fast Recovery, continued: Transmit a segment, if allowed by the new value of cwnd and the receiver‟s advertised window.

5) When an ACK arrives that acknowledges new data, this ACK could be the acknowledgment elicited by the retransmission from step 2, or elicited by a later retransmission.

SACK

Another way to deal with multiple packet losses is to tell the source which packets have arrived at the destination.

SACK does so exactly. TCP adapts accumulated acknowledgement strategy to acknowledge successfully transmitted packets; this improves the robustness of acknowledgement when the path back to the source features a high loss rate. However, the drawback of accumulated acknowledgement is that after a packet loss, the source cannot determine which packets have been successfully transmitted. Therefore, it is unable to recover more than one lost packet in each round-trip time.

The SACK option field contains a number of SACK blocks, where each SACK block reports a non- contiguous set of data that has been received and buffered. The destination uses ACK with the SACK option to inform the source that one contiguous block of data has been received out of order at the destination.

When source receive SACK blocks they are used to maintain an image of the receiver queue, i.e., which packets are missing and which have been received at the destination. A scoreboard is set up to track these transmitted and received packets according to previous information in the SACK option. For each transmitted packet, the scoreboard records its sequence number and a flag bit that indicates whether the packet has been

„„SACKed.” A packet with the SACKed bit turned on does not require to be retransmitted, but packets with the SACKed bit off and a sequence number less than the highest SACKed packet are eligible for retransmission.

Whether a SACKed packet is on or off, it is removed from the retransmission buffer only when it has been cumulatively acknowledged.

SACK TCP implementation still uses the same congestion control algorithms as TCP Reno. The main difference between SACK TCP and TCP Reno is the behavior in the event of multiple packet losses. SACK TCP refines the fast retransmit and fast recovery strategy of TCP Reno so that multiple lost packets in a single window can be recovered within one round-trip time.

TCP Vegas

In TCP Vegas [13], as in TCP Reno, a triple-duplicate ACK always results in packet retransmission. However, in order to retransmit the lost packets quickly, Vegas extend Reno‟s fast retransmission strategy. Vegas measures the RTT for every packet sent based on fine–

grained clock values. Using these fine–grained RTT measurements, a timeout period for each packet is computed. When a duplicate ACK is received, Vegas checks whether the timeout period of the oldest unacknowledgement packet has expired. If so, the packet is retransmitted. This modification leads to packet retransmission after just one or two duplicate ACKs. When a non-duplicate ACK that is the first or second ACK after a fast retransmission is received, Vegas again checks for the expiration of the timer and may retransmit another packet. Note that packet retransmission due to an expired fine–grained timer is conditioned on receiving certain ACKs. This technique enables the faster detection of losses and recovery from multiple drops without restarting the slow start phase if the retransmission timer does not expire before. Hence, it allows dealing with a problem that Reno suffers from considerable, namely, multiple drops in the same data window.

After a packet retransmission is triggered by a dupack and the ACK of the lost packet is received, the congestion window size is reduced to alleviate the network congestion. There are two cases for Vegas to

(8)

set the congestion window size. If the lost packet has been transmitted just once, the congestion window size will be three fourth of the previous size. Otherwise, Vegas consider the congestion to be more serious, and one-half of the previous congestion window size is set into the current congestion window. Notably, in the case of multiple packet losses occurring during one RTT and triggering more than one fast retransmission, the congestion window is reduced only for the first retransmission.

TCP-Jersey

TCP-Jersey[13] adopts slow start, congestion avoidance, and fast recovery from Reno but replaces Reno‟s fast retransmit with explicit retransmit and introduces the rate control procedure. The flow diagram of TCP-Jersey is shown in Fig. 8. The only difference between Reno‟s fast retransmit procedure and Jersey‟s explicit retransmit procedure is that unlike Reno‟s retransmit procedure, which halves the current congestion window before starting the retransmission, explicit retransmit maintains the current CWND. It leaves the adjustment of the congestion window to the rate control procedure. The operation of the rate control procedure is also quite simple. The procedure sets the SSTHRESH to the optimum congestion window size computed based on its available bandwidth estimator (ABE), and sets the CWND to the SSTHRESH if the connection is in the congestion avoidance phase. The sender receiving a module in TCP-Jersey operates as follows. Upon entry, it invokes the ABE procedure. If an ACK is received without the congestion warning (CW) mark, it proceeds as Reno (i.e., invoking slow start or congestion avoidance depending on whether or not the CWND is below the SSTHRESH). If the received ACK or the nth dupack is marked with the CW bit, it calls the rate control procedure to adjust the window size and proceeds with slow start or congestion avoidance if it is an ACK; otherwise, it enters the explicit retransmit if it is the nth dupack. When the nth dupack is received without the CW mark, TCP-Jersey renders the packet drop caused by a random error and therefore enters explicit retransmit without adjusting the window size.

TCP Westwood

In TCP Westwood [13], the congestion window increments during slow start and congestion avoidance remain the same as in Reno, that is, they are exponential and linear, respectively. A packet loss is indicated by (a) the reception of 3 dupacks or (b) coarse timeout expiration. In case the loss indication is 3 dupacks, TCP Westwood sets the CWND and SSTHRESH as follows:

if (3 dupacks are received) {

SSTHRESH = (BWE*RTTmin)/seg_size;

if (CWND > SSTHRESH) CWND = SSTHRESH; }

In the pseudo-code, seg_size denotes the length of a TCP segment in bits. Note that the reception of n

dupacks is followed by the retransmission of the missing segment, as in the standard Fast Retransmit implemented by TCP Reno. In addition, the window growth after the CWND is reset to the SSTHRESH according to the rules established in the Fast Retransmit algorithm. During the congestion avoidance phase, TCP Westwood is probing for extra available bandwidth.

Therefore, when n dupacks are received, it means that TCP Westwood has hit the network capacity (or that in the case of wireless links, one or more segments were dropped due to sporadic losses). Thus, the SSTHRESH is set equal to the window capable of producing the measured rate BWE when the bottleneck buffer is empty. The congestion window is set equal to the SSTHRESH and the congestion avoidance phase is entered again to gently probe for new available bandwidth. Note that after the SSTHRESH has been set, the congestion window is set equal to the slow start threshold only if CWND > SSTHRESH. It is possible that the current CWND may be below the threshold.

This occurs after time-out for example, when the window is dropped to one. During slow start, the CWND still features an exponential increase as in the current implementation of TCP Reno.

In case a packet loss is indicated by timeout expiration, the CWND and SSTHRESH are set as follows:

if (coarse timeout expires){ CWND = 1;

SSTHRESH = (BWE*RTTmin)/seg_size;

if (SSTHRESH < 2) SSTHRESH = 2; }

The rationale for the algorithm above is that after a timeout, the CWND and the SSTHRESH are set equal to one and BWE, respectively. Thus, the basic Reno behavior is still captured, while a speedy recovery is ensured by setting the SSTHRESH to the BWE.

Different approaches are compared in the below:

Protoco ls Type

Advantages Disadvantages

TCP Tahoe

Detect and retransmit lost packet much sooner than timeouts

Takes complete timeout interval to detect a packet loss TCP

Reno

Performs well over WLAN when only a single packet is lost

Poor performance over WLAN when multiple packets are lost

TCP New Reno

Performs better than TCP Reno over UMTS when multiple packets are lost

Cannot distinguish b/w congestion loss and packet errors.

SACK The source has better information of the packets that have been successfully delivered compared to other TCP versions.

Requires

modification to the acknowledgement procedures at both sender and receiver sides.

TCP Vegas

Good performance over WLAN when using Snoop protocol

Cannot distinguish between congestion loss and packet errors

TCP Jersey

It replaces Reno‟s fast retransmit with rate control

Packet drop caused by random error

(9)

procedure without adjusting window size TCP

Westwo od

Faster recovery and more effective congestion avoidance lead to better throughput and delay

TCP-W is not sufficiently

evaluated

VI. CONCLUSION

This paper presents a survey on packet re-ordering, fast retransmit and fast recovery mechanisms of some existing transport protocols. This taxonomy provides a unified terminology and framework for the comparison and evaluation of this class of protocols. In addition, the insight provided by the taxonomy and survey in this paper may be used to guide future research in this area.

Studying TCP behavior is still an active area of research and requires further investigation since several mechanisms beside the ones described in this article rely on this type of estimation. By comparing different protocols and a theoretical study of the transport protocols will be discussed and developed in the forthcoming article.

REFERENCES

[1] J. C. R. Bennett, C. Partridge, and N. Shectman

“Packet Reorder Is Not Pathological Network Behavior,” IEEE/ACM Trans. Net. vol. 7, Dec, 1999.

[2] S. McCreary and K. C. Claffy “Trends in Wide Area IP Traffic Patterns,” 13^th ITC Specialist Seminar on Internet Traffic Measurement and

Modeling 2000, http:

//www.caida.org/outreach/papers/ 2000 / [3] M. Allman, V. Paxon, and W. Stevens “TCP

Congestion Control,” RFC 2581, Apr. 1999.

[4] V. Jacobson, “Modified TCP Congestion

Avoidance Algorithm,”

ftp://ftp.isi.edu/end2end/end2endinterest- [5] D.D. Clark, “The Design Philosophy of the

DARPA Internet Protocols,” ACM SIGCOMM Computer Comm. Rev., vol. 18,,Aug. 1988.

[6] M. Laor and L. Gendel, “The Effect of Packet Reordering in a Backbone Link on Application Throughput,” IEEE Network, vol. 16, no. 5, pp.

28-36, Sept./Oct. 2002.

[7] K.-C. Leung and V.O.K. Li, “Generalized Load Sharing for Packet- Switching Networks I:

Theory and Packet-Based Algorithm,” IEEE Trans. Parallel and Distributed Systems, vol. 17 July 2006.

[8] S. Bhandarkar and A.L.N. Reddy. TCP-DCR:

Making TCP Robust to Non-Congestion Events.

Lecture Notes in Computer Science, Vol. 3042, pp. 712-724, May 2004.

[9] M. Zhang, B. Karp, S. Floyd, and L. Peterson.

RR-TCP: A Reordering- Robust TCP with DSACK. Proceedings of IEEE ICNP 2003, pp.

95-106, Atlanta, GA, USA, 4-7 Nov 2003.

[10] S. Bohacek, J.P. Hespanha, J. Lee, C. Lim, and K. Obraczka. A New TCP for Persistent Packet Reordering. IEEE/ACM Transactionson Networking, Vol. 14, No. 2, pp. , April 2006.

[11] R. Ludwig and R.H. Katz. The Eifel Algorithm:

Making TCP Robust Against Spurious Retransmissions. ACM SIGCOMM Computer Communication Review, Vol. 30, Issue 1, pp. 30- 36, January 2000.

[12] F. Wang and Y. Zhang. Improving TCP Performance over Mobile Ad- Hoc Networks with Out-Of- Order Detection and Response.

Proceedings of ACM MOBIHOC 2002, pp. 217- 225, Lausanne, Switzerland, 9-11 June 2002.

[13] Cheng-Yuan Ho, Yaw-Chung Chen , Yi-Cheng Chan , Cheng-Yun Ho “Fast retransmit and fast recovery schemes of transport protocols: A

survey and taxonomy”

www.elsevier.com/locate/comnet

[14] M. Allman, V. Paxson, and W. Stevens, TCP Congestion Control, IETF RFC 2581, Apr. 1999.

[15] E. Blanton and M. Allman, “On Making TCP More Robust to Packet Reordering,” ACM SIGCOMM Computer Comm. Rev., vol. 32, no.

1, pp. 20-30, Jan. 2002.

[16] M. Zhang, B. Karp, S. Floyd, and L. Peterson,

“RR-TCP: A Reordering-Robust TCP with DSACK,” Proc. IEEE Int‟l Conf. Network Protocols (ICNP ‟03), pp. 95-106, Nov. 2003.

[17] S. Bhandarkar and A.L.N. Reddy, “TCP-DCR:

Making TCP Robust to Non-Congestion Events,”

Lecture Notes in Computer Science, vol. 3042, pp. 712-724, May 2004.

[18] Ka-Cheong Leung, Victor O.K. Li, Fellow, , and Daiqin Yang, “An Overview of Packet Reordering in Transmission Control Protocol (TCP):Problems, Solutions, and Challenges”

IEEE Transactions On Parallel and Distributed Systems, VOL. 18, April 2007

[19] S. Floyd and K. Fall, “Promoting the Use of End- to-End Congestion Control in the Internet,”

IEEE/ACM Trans. Networking, vol. 7, Aug.

1999.

