Internet Engineering Task Force M. Welzl Internet-Draft University of Innsbruck Intended status: Informational March 3, 2009 Expires: September 4, 2009 A Survey of Lower-than-Best Effort Transport Protocols draft-welzl-ledbat-survey-00.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 4, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This document provides a survey of transport protocols which are designed to have a smaller bandwidth and/or delay impact on standard TCP than standard TCP itself when they share a bottleneck with it. Welzl Expires September 4, 2009 [Page 1] Internet-Draft LBE Transport Survey March 2009 Such protocols could be used for low-priority "background" traffic, as they provide what is sometimes called a "less than" (or "lower than") best effort service. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Delay-based transport protocols . . . . . . . . . . . . . . . 3 3. Non-delay-based transport protocols . . . . . . . . . . . . . 6 4. Application layer approaches . . . . . . . . . . . . . . . . . 6 5. Orthogonal work . . . . . . . . . . . . . . . . . . . . . . . 7 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 9. Informative References . . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 Welzl Expires September 4, 2009 [Page 2] Internet-Draft LBE Transport Survey March 2009 1. Introduction As a starting point for the work in the LEDBAT group, this document presents a brief survey of efforts to attain a Less than Best Effort (LBE) service without help from routers. We loosely define a LBE service as a service which has smaller bandwidth and/or delay impact on standard TCP than standard TCP itself when sharing a bottleneck with it. We refer to systems that provide this service as Less than Best Effort (LBE) systems. Generally, LBE behavior can be achieved by reacting to queue growth earlier than standard TCP would, or by changing the congestion avoidance behavior of TCP without utilizing any additional implicit feedback. Some mechanisms achieve a LBE behavior at the application layer, e.g. by changing the receiver window of standard TCP, and there is also a substantial amount of work that is related to the LBE concept but not presenting a solution that can be installed in end hosts or expected to work over the Internet. According to this classification, solutions have been categorized as delay-based transport protocols, non-delay-based transport protocols, application layer approaches and orthogonal work in this document. The author wishes to emphasize that, in its present form, this document is only a starting point and not based on a thorough literature study. Many relevant references will be missing, and an apology goes to all authors of related work that has not been mentioned here. 2. Delay-based transport protocols It is wrong to generally equate "little impact on standard TCP" with "small sending rate". Unless the sender's maximum window is limited for some reason, and in the absence of ECN support, standard TCP will normally increase its rate until a queue overflows, causing one or more packets to be dropped and the rate to be reduced. A protocol which stops increasing the rate before this event happens can, in principle, achieve a better performance than standard TCP. In the absence of any other traffic, this is even true for TCP itself when its maximum send window is limited to the bandwidth*round-trip time (RTT) product. TCP Vegas [Bra+94] is one of the first protocols that was known to have a smaller sending rate than standard TCP when both protocols share a bottleneck [Kur+00] -- yet it was designed to achieve more, not less throughput than standard TCP. Indeed, when it is the only protocol on the bottleneck, the throughput of TCP Vegas is greater than the throughput of standard TCP. Depending on the bottleneck queue length, TCP Vegas itself can be starved by standard TCP flows. Welzl Expires September 4, 2009 [Page 3] Internet-Draft LBE Transport Survey March 2009 This can be remedied to some degree by the RED Active Queue Management mechanism [RFC2309]. The congestion avoidance behavior is the protocol's most important feature in terms of historical relevance as well as relevance in the context of this document (it has been shown that other elements of the protocol can sometimes play a greater role for its overall behavior [Hen+00]). In Congestion Avoidance, once per RTT, TCP Vegas calculates the expected throughput as WindowSize / BaseRTT, where WindowSize is the current congestion window and BaseRTT is the minimum of all measured RTTs. The expected throughput is then compared with the actual (measured) throughput. If the actual throughput is smaller than the expected throughput minus a threshold, this is taken as a sign that the network is underutilized, causing the protocol to linearly increase its rate. If the actual throughput is greater than the expected throughput plus a threshold, this is taken as a sign of congestion, causing the protocol to linearly decrease its rate. TCP Vegas has been analyzed extensively. One of the most prominent properties of TCP Vegas is its fairness between multiple flows of the same kind, which does not penalize flows with large propagation delays in the same way as standard TCP. While it was not the first protocol that uses delay as a congestion indication, its predecessors (which can be found in [Bra+94]) are not discussed here because of the historical "landmark" role that TCP Vegas has taken in the literature. Transport protocols which were designed to be non-intrusive include TCP-LP [Kuz+06], TCP Nice [Ven+02] and 4CP [Liu+07]. Using a simple analytical model, the authors of [Kuz+06] illustrate the feasibility of this endeavor by showing that, due to the non-linear relationship between throughput and RTT, it is possible to remain transparent to standard TCP even when the flows under consideration have a larger RTT than standard TCP flows. TCP Nice [Ven+02] follows the same basic approach as TCP Vegas but improves upon it in some aspects. Because of its moderate linear- decrease congestion response, TCP Vegas can affect standard TCP despite its ability to detect congestion early. TCP Nice removes this issue by halving the congestion window (at most once per RTT, like standard TCP) instead of linearly reducing it. To avoid being too conservative, this is only done if a fixed predefined fraction of delay-based incipient congestion signals appears within one RTT. Otherwise, TCP Nice falls back to the congestion avoidance rules of TCP Vegas if no packet was lost or standard TCP if a packet was lost. One more feature of TCP Nice is its ability to support a congestion window of less than one packet, by clocking out single packets over Welzl Expires September 4, 2009 [Page 4] Internet-Draft LBE Transport Survey March 2009 more than one RTT. With ns-2 simulations and real-life experiments using a Linux implementation, the authors of [Ven+02] show that TCP Nice achieves its goal of efficiently utilizing spare capacity while being non-intrusive to standard TCP. Other than TCP Vegas and TCP Nice, TCP-LP uses only the one-way delay (OWD) instead of the RTT as an indicator of incipient congestion. This is done to avoid reacting to delay fluctuations that are caused by reverse cross-traffic. Using the TCP Timestamps option [RFC1323], the OWD is determined as the difference between the receiver's Timestamp value in the ACK and the original Timestamp value that the receiver copied into the ACK. While the result of this subtraction can only precisely represent the OWD if clocks are synchronized, its absolute value is of no concern to TCP-LP and hence clock synchronization is unnecessary. Using a constant smoothing parameter, TCP-LP calculates an Exponentially Weighted Moving Average (EWMA) of the measured OWD and checks whether the result exceeds a threshold within the range of the minimum and maximum OWD that was seen during the connections's lifetime; if it does, this condition is interpreted as an "early congestion indication". The minimum and maximum OWD values are initialized during the slow-start phase. Regarding its reaction to an early congestion indication, TCP-LP tries to strike a middle ground between the overly conservative choice of immediately setting the congestion window to one packet and the presumably too aggressive choice of halving the congestion window like standard TCP. It does so by halving the window at first in response to an early congestion indication, then initializing an "interference time-out timer", and maintaining the window size until this timer fires. If another early congestion indication appeared during this "interference phase", the window is then set to 1; otherwise, the window is maintained and TCP-LP continues to increase it the standard Additive-Increase fashion. This method ensures that it takes at least two RTTs for a TCP-LP flow to decrease its window to 1, and, like standard TCP, TCP-LP reacts to congestion at most once per RTT. With ns-2 simulations and real-life experiments using a Linux implementation, the authors of [Kuz+06] show that TCP-LP is largely non-intrusive to TCP traffic while at the same time enabling it to utilize a large portion of the excess network bandwidth, which is fairly shared among competing TCP-LP flows. They also show that using their protocol for bulk data transfers greatly reduces file transfer times of competing best-effort web traffic. Welzl Expires September 4, 2009 [Page 5] Internet-Draft LBE Transport Survey March 2009 3. Non-delay-based transport protocols 4CP [Liu+07], which stands for "Competitive and Considerate Congestion Control", is a protocol which provides a LBE service by changing the window control rules of standard TCP. A "virtual window" is maintained, which, during a so-called "bad congestion phase" is reduced to less than a predefined minimum value of the actual congestion window. The congestion window is only increased again once the virtual window exceeds this minimum, and in this way the virtual window controls the duration during which the sender transmits with a fixed minimum rate. The 4CP congestion avoidance algorithm allows for setting a target average window and avoids starvation of "background" flows while bounding the impact on "foreground" flows. Its performance was evaluated in ns-2 simulations and in real-life experiments with a kernel-level implementation in Microsoft Windows Vista. Some work was done on applying weights to congestion control mechanisms, allowing a flow to be as aggressive as a number of parallel TCP flows at the same time. This is usually motivated by the fact that users may want to assign different priorities to different flows. The first, and best known, such protocol is MulTCP [Cro+98], which emulates N TCPs in a rather simple fashion. An improved version of MulTCP is presented in [Hac+04], and there is also a variant where only one feedback loop is applied to control a larger traffic aggregate by the name of Probe-Aided (PA-)MulTCP [Kuo+08]. Another protocol, CP [Ott+04], applies the same concept to the TFRC protocol [RFC5348] in order to provide such fairness differentiation for multimedia flows. The general assumption underlying all of the above work is that these protocols are "N-TCP-friendly", i.e. they are as TCP-friendly as N TCPs, where N is a positive (and possibly natural) number which is greater than or equal to 1. The MulTFRC [Dam+09] protocol, another extension of TFRC for multiple flows, is however able to support values between 0 and 1, making it applicable as a mechanism for a LBE service. Since it does not react to delay like the mechanisms above but adjusts its rate like TFRC, it can probably be expected to be more aggressive than mechanisms such as TCP Nice or TCP-LP. This also means that MulTFRC is less likely to be prone to starvation, as its aggression is tunable at a fine granularity even when N is between 0 and 1. 4. Application layer approaches The mechanism described in [Spr+00] controls the bandwidth by letting the receiver intelligently manipulate the receiver window of standard Welzl Expires September 4, 2009 [Page 6] Internet-Draft LBE Transport Survey March 2009 TCP. This is done because the authors assume a client-server setting where the receiver's access link is typically the bottleneck. The scheme incorporates a delay-based calculation of the expected queue length at the bottleneck, which is quite similar to the calculation in the above delay based protocols, e.g. TCP Vegas. Using a Linux implementation, where TCP flows are classified according to their application's needs, it is shown that a significant improvement in packet latency can be attained over an unmodified system while maintaining good link utilization. Receiver window tuning is also done in [Key+04], where choosing the right value for the window is phrased as an optimization problem. On this basis, two algorithms are presented, binary search -- which is faster than the other one at achieving a good operation point but fluctuates -- and stochastic optimization, which does not fluctuate but converges slower than binary search. These algorithms merely use the previous receiver window and the amount of data received during the previous control interval as input. According to [Key+04], the encouraging simulation results suggest that such an application level mechanism can work almost as well as a transport layer scheme like TCP-LP. TODO: mention other rwnd tuning and different application layer work, e.g. from related work sections of [Egg+05] and [Kok+04] and intro of [Key+04]. 5. Orthogonal work Various suggestions have been published for realizing a LBE service by influencing the way packets are treated in routers. One example is the Persistent Class Based Queuing (P-CBQ) scheme presented in [Car+01], which is a variant of Class Based Queuing (CBQ) with per- flow accounting. RFC 3662 [RFC3662] defines a DiffServ per-domain behavior called "Lower Effort". Harp [Kok+04] realizes a LBE service by dissipating background traffic to less-utilized paths of the network. This is achieved without changing routers by using edge nodes as relays. According to the authors, these edge nodes should be gateways of organizations in order to align their scheme with usage incentives, but the technical solution would also work if Harp was only deployed in end hosts. It detects impending congestion by looking at delay similar to TCP Nice [Ven+02] and manages to improve utilization and fairness over pure single-path solutions. An entirely different approach is taken in [Egg+05]: here, the priority of a flow is reduced via a generic idletime scheduling Welzl Expires September 4, 2009 [Page 7] Internet-Draft LBE Transport Survey March 2009 strategy in a host's operating system. While results presented in this paper show that the new scheduler can effectively shield regular tasks from low-priority ones (e.g. TCP from greedy UDP) with only a minor performance impact, it is an underlying assumption that all involved end hosts would use the idletime scheduler. In other words, it is not the focus of this work to protect a standard TCP flow which originates from any host where the presented scheduling scheme may not be implemented. TODO: studies dealing with the precision of congestion prediction in end hosts (i.e. using delay to determine the onset of congestion) may be relevant in this document, and could be discussed here, e.g. [Bha+07] and the references therein. 6. Acknowledgements The author would like to thank Dragana Damjanovic for reference pointers. Surely lots of other folks will help in one way or another later and I'll thank them all here. 7. IANA Considerations This memo includes no request to IANA. 8. Security Considerations This document introduces no new security considerations. 9. Informative References [Bha+07] Bhandarkar, S., Reddy, A., Zhang, Y., and D. Loguinov, "Emulating AQM from end hosts", Proceedings of ACM SIGCOMM 2007, 2007. [Bra+94] Brakmo, L., O'Malley, S., and L. Peterson, "TCP Vegas: New techniques for congestion detection and avoidance", Proceedings of SIGCOMM '94, pages 24-35, August 1994. [Car+01] Carlberg, K., Gevros, P., and J. Crowcroft, "Lower than best effort: a design and implementation", Workshop on Data communication in Latin America and the Caribbean 2007, San Jose, Costa Rica, Pages: 244 - 265, 2001. Welzl Expires September 4, 2009 [Page 8] Internet-Draft LBE Transport Survey March 2009 [Cro+98] Crowcroft, J. and P. Oechslin, "Differentiated end-to-end Internet services using a weighted proportional fair sharing TCP", ACM SIGCOMM Computer Communication Review vol. 28, no. 3 (July 1998), pp. 53-69, 1998. [Dam+09] Damjanovic, D. and M. Welzl, "MulTFRC: Providing Weighted Fairness for Multimedia Applications (and others too!)", Work in progress ..., 2009. [Egg+05] Eggert, L. and J. Touch, "A Lower Effort Per-Domain Behavior (PDB) for Differentiated Services", Proceedings of 20th ACM Symposium on Operating Systems Principles SOSP 2005, Brighton, United Kingdom, pp. 249/262, October 2005. [Hac+04] Hacker, T., Noble, B., and B. Athey, "Improving Throughput and Maintaining Fairness using Parallel TCP", Proceedings of Infocom 2004, March 2004. [Hen+00] Hengartner, U., Bolliger, J., and T. Gross, "TCP Vegas revisited", Proceedings of Infocom 2000, March 2000. [Key+04] Key, P., MassouliA(C), L., and B. Wang, "Emulating Low- Priority Transport at the Application Layer: a Background Transfer Service", Proceedings of ACM SIGMETRICS 2004, January 2004. [Kok+04] Kokku, R., Bohra, A., Ganguly, S., and A. Venkataramani, "A Multipath Background Network Architecture", Proceedings of Infocom 2007, May 2007. [Kuo+08] Kuo, F. and X. Fu, "Probe-Aided MulTCP: an aggregate congestion control mechanism", ACM SIGCOMM Computer Communication Review vol. 38, no. 1 (January 2008), pp. 17-28, 2008. [Kur+00] Kurata, K., Hasegawa, G., and M. Murata, "Fairness Comparisons Between TCP Reno and TCP Vegas for Future Deployment of TCP Vegas", Proceedings of INET 2000, July 2000. [Kuz+06] Kuzmanovic, A. and E. Knightly, "TCP-LP: low-priority service via end-point congestion control", IEEE/ACM Transactions on Networking (ToN) Volume 14, Issue 4, pp. 739-752., August 2006, . [Liu+07] Liu, S., Vojnovic, M., and D. Gunawardena, "Competitive and Considerate Congestion Control for Bulk Data Welzl Expires September 4, 2009 [Page 9] Internet-Draft LBE Transport Survey March 2009 Transfers", Proceedings of IWQoS 2007, June 2007. [Ott+04] Ott, D., Sparks, T., and K. Mayer-Patel, "Aggregate congestion control for distributed multimedia applications", Proceedings of Infocom 2004, March 2004. [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, April 1998. [RFC3662] Bless, R., Nichols, K., and K. Wehrle, "A Lower Effort Per-Domain Behavior (PDB) for Differentiated Services", RFC 3662, December 2003. [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, September 2008. [Spr+00] Spring, N., Chesire, M., Berryman, M., Sahasranaman, V., Anderson, T., and B. Bershad, "Receiver based management of low bandwidth access links", Proceedings of Infocom 2000, pp. 245-254, vol.1, 2000. [Ven+02] Venkataramani, A., Kokku, R., and M. Dahlin, "TCP Nice: a mechanism for background transfers", Proceedings of OSDI '02, 2002. Author's Address Michael Welzl University of Innsbruck Technikerstr. 21 A Innsbruck, 6020 Austria Phone: +43 512 507 6110 Email: michael.welzl@uibk.ac.at Welzl Expires September 4, 2009 [Page 10]