Inter-Domain Routing I. van Beijnum Internet-Draft IMDEA Networks Expires: September 10, 2009 R. Winter NEC Labs Europe March 9, 2009 A BGP Inter-AS Cost Attribute draft-van-beijnum-idr-iac-02 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 10, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract Although BGP implementations have extensive path selection algorithms, in practice operators have trouble performing van Beijnum & Winter Expires September 10, 2009 [Page 1] Internet-Draft BGP Inter-AS Cost March 2009 satisfactory traffic engineering of incoming traffic based on BGP attributes that are taken into account in the path selection algorithm alone. For this reason, many ASes deaggregate their address range(s) into smaller blocks and announce these blocks differently to different neighboring ASes in order to arrive at the desired traffic flow. This practice contributes to the growth of the global routing table, which drives up capital expenditures for networks engaging in inter-domain routing. This memo introduces a new inter-domain metric that supports finer-grained traffic engineering than current BGP attributes. 1. Introduction An origin AS today has no appropriate means to express preference of a certain path leading towards it, as the BGP decision process is not designed to take such preference into account. The two sole means an origin AS has in order to influence the way traffic enters its network are either prefix disaggregation - resulting in global routing table growth - or AS path prepending - a very imprecise method. It's easy to see how comparing AS paths lengths is problematic in today's flat AS hierarchy. Assume 10 tier-1 ISPs that can reach all destinations connected to the internet through peering, and assume that the local AS buys transit service from two tier-1 ISPs. The traffic to the customers of those ISPs will normally flow through the respective ISP. However, for all destinations reachable over the 8 other tier-1s, the AS paths will have the same length over both transit ISPs. This means that prepending the AS path towards one ISP has a very dramatic effect: as much as 80% of all traffic may subsequently flow over the non-prepended ISP. A similar situation can occur in more complex types of connectivity. With a finer- grained value that is communicated across ASes this problem would be reduced. This memo proposes such a finer-grained inter-AS metric: the inter-AS cost (IAC). With this metric, it is possible for destinations of traffic to make precise adjustments to the metrics seen by the sources of traffic and thus make it possible to arrive at more favorable load sharing ratios between multiple links to different ASes without having to resort to the advertisement of more specific prefixes. In the past, efforts somewhat similar to this have been undertaken. In 1995, [I-D.antonov-bgp-metrics] proposed new per-hop BGP metrics. However, this proposal suffered from high complexity and a resulting risk of unforeseen consequences. A year later, [I-D.chen-bgp-dpa] proposed a new inter-AS metric for the purpose of allowing symmetric routing and load sharing. This proposal wasn't fleshed out in much van Beijnum & Winter Expires September 10, 2009 [Page 2] Internet-Draft BGP Inter-AS Cost March 2009 detail. Neither proposal specifically addressed the issue of granularity in an inter-AS metric. Note that the definition of IAC and IAClocal have been fundamentally changed since version -01 of this draft. See the end of the document for more information about the changes. 2. IAC and IAClocal The new metric is named Inter-AS Cost (IAC) which is added by the origin AS as an optional transitive attribute to the prefix announcement. The content of the IAC is an 8-bit signed value that represents the relative cost or preference towards the source of the associated prefix compared to other paths for reaching the same prefix. However, to avoid small changes in IAC from having a very large effect, like prepending the AS path by one AS hop has a very large effect today, a randomization component is introduced. The IAC, the randomization component and optionally a local cost towards the next hop together make up the IAClocal value, which is used to compare prefixes. The randomization component is computed by XORing all the octets in the AS numbers of the source of the announcement, the next hop AS and the local AS. The resulting 8-bit value R is subsequently interpreted as a signed value with possible values -128 to 127. The randomization component R makes sure that even when no policy is applied, destinations with the same properties will be preferred through different next hop addresses, and that different ASes make this selection differently, so there is a (very roughly) equal distribution of traffic over different links, both for the sending and the receiving ASes. The optional local cost (LC) is an integer that can take the values between -256 and 255. Its purpose is to give some control to the local AS is case the comparison of the computed local is unfavorable for the local AS. It has to be noted that the local AS has other means than the IAClocal to accomplish outbound traffic engineering: the LOCAL_PREF. IAClocal is computed as follows: IAClocal = IAC * 2 + R (+ LC) Hence, IAClocal can be an integer value between -640 and 636. Any IAClocal values outside this range MUST lead to the presence of the van Beijnum & Winter Expires September 10, 2009 [Page 3] Internet-Draft BGP Inter-AS Cost March 2009 IAClocal attribute being ignored. The IAClocal is stored as a 16-bit signed value in network byte order in the IAC BGP path attribute. The new IAC path attribute is an optional transitive attribute that can take two forms: over eBGP, the attribute only contains the IAC. When communicated through iBGP, the attribute both contains the IAC and the IAClocal, in that order. When a router generates a route locally for announcement over BGP, an IAC of 0 MAY be included. However, it is recommended that an IAC attribute is only generated when an IAC is specified in the configuration. A missing IAC is semantically distinct from an IAC of 0, so configuring a 0 IAC MUST result in the inclusion of the IAC attribute. The IAClocal is computed by the router receiving a prefix containing the IAC attribute over eBGP, or when sourcing a prefix advertisement. If no IAC attribute is received over eBGP, a router MUST NOT create one, and no IAClocal is computed. The IAClocal MUST NOT be computed when no IAC attribute is present. In the BGP route selection algorithm, the IAClocal is compared immediately following the comparison of the IGP cost. As such, the IAClocal is only considered for routes that have identical LOCAL_PREF, AS_PATH, possibly MED and learned over eBGP/iBGP properties and more. For these routes, without IAClocal, route selection would come down to the last two tie breaking steps. In addition, the IAClocal is only considered when all the routes under consideration at this point in the selection process contain the IAC attribute holding an IAClocal value. If the IAClocal is considered, the route with the highest IAClocal is selected. If there are multiple routes that share the highest IAClocal, the remaining tie breaking rules are executed over the routes sharing the highest IAClocal. WARNING: In iBGP, there is no loop detection. As such, loops may occur when the tie breaking rules aren't implemented identically by all iBGP routers. Consider the following topology: R1 --- R2 --- R3 --- R4 If R1 and R4 have external routes towards a destination, and the IGP costs that R2 and R3 see over both R1 and R4 are identical, it would be possible for R2 to prefer the path over R4 because of the IAClocal, but if R3 doesn't implement the IAC attribute, it may prefer the path over R1 because of the existing tie breaking rules. This situation may occur when not all routers in an AS support the IAC attribute, and next hop addresses for eBGP routes are van Beijnum & Winter Expires September 10, 2009 [Page 4] Internet-Draft BGP Inter-AS Cost March 2009 redistributed in the IGP using a metric that doesn't take the interior hops into account, such as the OSPF external type 1 metric. For this reason, operators MUST avoid redistributing connected interfaces as E1 in OSPF (or similar in other IGPs) if there is a mixed IAC-capable and non-IAC-capable environment in the AS. 3. Usage guidelines If the distribution of AS path lengths between two or more links towards the rest of the internet is equal, then the randomization factor should make the traffic distribution between different links very roughly equal. Suppose there are two links, and a traffic distribution of 1 : 2 (33% vs 67%) is desired. This means that for 67 - 33 = 34% of the ASes the route selection must be pushed towards the other link using the IAClocal. That means a difference of 127 * 34% = 43 in IAC between the route announced over the first link and the same route announced over the second link. This can be accomplished with an IAC of +43 on the first and 0 on the second link or -43 on the second link and 0 on the first, or any other combination of IACs with a difference of 43. However, in practice the new traffic distribution will probably not be immediately equal to what's desired, so additional adjustments will likely be necessary. Those can be based on the difference between the observed traffic distribution and the desired traffic distribution. So if the difference is 10%, the difference in IACs can be increased or decreased by 10%. Other heuristics may prove useful in practice. 4. Changes In the previous versions of this draft, the IAC replaced the AS path length in the path selection algorithm. We changed this for two reasons. The first is that the differences in path selection between routers that do and routers that don't support the IAC would be too large in this situation, making for a very challenging deployment scenario. The second reason is because we believe that there is no real reason for intermediate ASes to update the IAC. Intermediate ASes are almost always ISPs, which as a rule don't have a need for additional mechanisms to balance traffic between outgoing paths towards the same destination. By having the IAC be considered after the IGP cost, existing mechanisms that ISPs use to influence the BGP traffic flow, such as manipulating MEDs and IGP costs, are maintained. van Beijnum & Winter Expires September 10, 2009 [Page 5] Internet-Draft BGP Inter-AS Cost March 2009 5. IANA considerations IANA is requested to allocate a BGP optional transitive attribute type code. 6. Security considerations As the IAClocal is compared so late in the BGP route selection process, there is little chance of the presence of the IAC being a security risk, other than the potential for iBGP loops as outlined earlier. It is highly recommended that implementers include a mechanism to remove the IAC attribute in incoming or outgoing BGP updates. This mechanism MUST be disabled by default. 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. 7.2. Informational References [I-D.chen-bgp-dpa] Chen, E. and T. Bates, "Destination Preference Attribute for BGP", draft-ietf-idr-bgp-dpa-05 (work in progress), September 1996. [I-D.antonov-bgp-metrics] Antonov, V., "BGP AS Path Metrics", draft-ietf-idr-bgp-metrics-00 (work in progress), March 1995. Appendix A. Document and discussion information The latest version of this document will always be available at http://www.muada.com/drafts/. Please direct questions and comments to the idr or grow mailinglists or directly to the authors. van Beijnum & Winter Expires September 10, 2009 [Page 6] Internet-Draft BGP Inter-AS Cost March 2009 Appendix B. Acknowledgement Rolf Winter and Iljitsch van Beijnum are partly funded by Trilogy, a research project supported by the European Commission under its Seventh Framework Program. Authors' Addresses Iljitsch van Beijnum IMDEA Networks Avda. del Mar Mediterraneo, 22 Leganes, Madrid 28918 Spain Email: iljitsch@muada.com Rolf Winter NEC Labs Europe Kurfuersten-Anlage 36 Heidelberg 69115 Germany Email: rolf.winter@nw.neclab.eu van Beijnum & Winter Expires September 10, 2009 [Page 7]