Internet Research Task Force T. Li, Ed. Internet-Draft Ericsson Intended status: Informational March 29, 2009 Expires: September 30, 2009 Preliminary Recommendation for a Routing Architecture draft-irtf-rrg-recommendation-02 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 30, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract It is commonly recognized that the Internet routing and addressing architecture is facing challenges in scalability, multi-homing, and inter-domain traffic engineering. This document reports the Routing Li Expires September 30, 2009 [Page 1] Internet-Draft RRG Recommendation March 2009 Research Group's prelimnary findings from its efforts towards developing a recommendation for a scalable routing architecture. This document is a work in progress. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Structure of This Document . . . . . . . . . . . . . . . . 4 2. Terminology and Abbreviations . . . . . . . . . . . . . . . . 4 3. Taxonomies of the Solution Space . . . . . . . . . . . . . . . 5 3.1. A Mechanism Taxonomy . . . . . . . . . . . . . . . . . . . 5 3.1.1. Layer 4 Transport . . . . . . . . . . . . . . . . . . 5 3.1.2. Translation . . . . . . . . . . . . . . . . . . . . . 6 3.1.3. Map & Encap . . . . . . . . . . . . . . . . . . . . . 6 3.2. A Functional Taxonomy . . . . . . . . . . . . . . . . . . 6 3.2.1. FIB Size Reduction . . . . . . . . . . . . . . . . . . 6 3.2.2. RIB Size Reduction . . . . . . . . . . . . . . . . . . 7 3.3. The Herrin Taxonomy . . . . . . . . . . . . . . . . . . . 7 3.3.1. Strategy A . . . . . . . . . . . . . . . . . . . . . . 7 3.3.1.1. Variants . . . . . . . . . . . . . . . . . . . . . 7 3.3.1.2. Mapping approaches . . . . . . . . . . . . . . . . 7 3.3.1.3. Failure handling approaches . . . . . . . . . . . 8 3.3.1.4. Compatibility approaches . . . . . . . . . . . . . 8 3.3.1.5. Core routing methods . . . . . . . . . . . . . . . 9 3.3.1.6. Major criticisms . . . . . . . . . . . . . . . . . 9 3.3.2. Strategy B . . . . . . . . . . . . . . . . . . . . . . 10 3.3.2.1. Locator variants . . . . . . . . . . . . . . . . . 10 3.3.2.2. Identifier variants . . . . . . . . . . . . . . . 11 3.3.2.3. Major criticisms . . . . . . . . . . . . . . . . . 11 3.3.3. Strategy C . . . . . . . . . . . . . . . . . . . . . . 11 3.3.3.1. Variants . . . . . . . . . . . . . . . . . . . . . 11 3.3.3.2. Major criticisms . . . . . . . . . . . . . . . . . 11 3.3.4. Strategy D . . . . . . . . . . . . . . . . . . . . . . 12 3.3.4.1. Variants . . . . . . . . . . . . . . . . . . . . . 12 3.3.4.2. Major criticisms . . . . . . . . . . . . . . . . . 12 3.3.5. Strategy E . . . . . . . . . . . . . . . . . . . . . . 12 3.3.5.1. Variants . . . . . . . . . . . . . . . . . . . . . 12 3.3.5.2. Major criticisms . . . . . . . . . . . . . . . . . 13 3.3.6. Strategy F . . . . . . . . . . . . . . . . . . . . . . 13 3.3.6.1. Major criticisms . . . . . . . . . . . . . . . . . 13 3.3.7. Strategy G . . . . . . . . . . . . . . . . . . . . . . 13 3.3.7.1. Major criticisms . . . . . . . . . . . . . . . . . 13 4. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 13 4.1. No manual renumbering of end hosts . . . . . . . . . . . . 14 4.2. Future progress . . . . . . . . . . . . . . . . . . . . . 14 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 Li Expires September 30, 2009 [Page 2] Internet-Draft RRG Recommendation March 2009 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.1. Normative References . . . . . . . . . . . . . . . . . . . 15 8.2. Informative References . . . . . . . . . . . . . . . . . . 15 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 15 Li Expires September 30, 2009 [Page 3] Internet-Draft RRG Recommendation March 2009 1. Introduction It is commonly recognized that the Internet routing and addressing architecture is facing challenges in scalability, multi-homing, and inter-domain traffic engineering. The problem being addressed has been documented in [I-D.narten-radir-problem-statement], and the design goals that we have agreed to can be found in [I-D.irtf-rrg-design-goals]. This document reports the Routing Research Group's (RRG's) preliminary results from its efforts towards developing a recommendation for a scalable routing architecture. This document is a work in progress. 1.1. Structure of This Document This document describes a number of the different possible approaches that could be taken in a new routing architecture, as well as a summary of the current thinking of the overall group regarding each approach. 2. Terminology and Abbreviations This section describes the common terminology used in this document. Particular architectures and discussions frequently define additional terms, qualify these terms or add additional semantics. address An address is a name that is both an interface locator and an endpoint identifier. FIB Forwarding Information Base, also known as the forwarding table. Typically, the forwarding table contains the subset of the information in the RIB that is actually needed at forwarding time. GUID Globally Unique IDentifier ISP Internet Service Provider identifier An identifier is the name of an object; identifiers have no topological sensitivity, and do not have to change, even if the object changes its point(s) of attachment within the network topology. Identifiers may have other properties, such as the scope of their uniqueness (local or global (default)), the probability of their uniqueness (statistical or absolute (default)), and their lifetime (ephemeral or permanent (default)). Li Expires September 30, 2009 [Page 4] Internet-Draft RRG Recommendation March 2009 locator A locator is a name that has topological sensitivity at a given layer and must change if the point of attachment at that layer changes. By default, a locator refers to layer 3. It is also possible to have locators at other layers. Locators may have other properties, such as their scope (local or global (default)) and their lifetime (ephemeral or permanent (default)). multihoming A site or host is multihomed if it has multiple topological connections to the network and the locators for those connections do not aggregate. RIB Routing Information Base, also known as the routing table. RIR Regional Internet Registry RLOC A Remote LOCator is a locator with global scope. SID Session IDentifier TE Traffic Engineering is a technique for controlling the path that traffic takes beyond baseline methods, such as shortest path first IGP computations and BGP shortest AS path computation. 3. Taxonomies of the Solution Space In trying to understand the entirety of the solution space that we are confronted with, we have made multiple attempts to divide the space into comprehensible sectors. The entire solution space is complex, and it seems difficult to capture all of the pertinent dimensions of the space with only a single perspective. Different taxonomies seem to provide insight during different discussions, and we summarize all of them here to capture all of the useful perspectives. Of these, we've found that Section 3.3 is the most useful so far and is where we will continue to focus our efforts. 3.1. A Mechanism Taxonomy In this taxonomy, solutions are grouped by the primary mechanisms that they use to achieve their goals. 3.1.1. Layer 4 Transport Transport solutions are characterized by their usage of modifications soley at layer 4 to provide locator and identifier independence. For example, if a transport protocol supports connections across multiple addresses as a means of supporting multi-homed hosts, and can seamlessly and transparently shift across these addresses, then it Li Expires September 30, 2009 [Page 5] Internet-Draft RRG Recommendation March 2009 can provide the multi-homing support that is required. However, in our discussions, it became clear that even with transport level agility, host-level renumbering of sites would still be necessary to support these types of solutions. The consensus of the group is that such site renumbering is widely unacceptable for operational reasons and thus, these types of solutions are not of interest for further exploration in this group as the primary basis for a scalable routing architecture. The advantages of these techinques are undeniable and are likely to complement other architectural approaches. 3.1.2. Translation Translation solutions are characterized by a translation operation between an identifier to a locator and back to an identifier as the packet traverses the network. Translation approaches do not add additional encapsulations to the packet as they traverse the network, usually translating the fields in their place in the packet. Translation solutions can further be categorized as those with separated fields for locators and identifiers and those that continue to use a single address field. Translation solutions also can be categorized as having the translation done in the host or in a middle box. 3.1.3. Map & Encap Map & Encap solutions are characterized by a lookup operation from the identifier to a locator and then an encapsulation of the packet payload into a tunnel that directs the packet across the topology. 3.2. A Functional Taxonomy In solving a problem one must keep clearly separate the goals and the means. Here the goal is to get a control handle on the scalability of the routing architecture. Another important issue to keep in mind is that, for any change to be made in one party of the Internet, it must do no harm to the rest of the system. 3.2.1. FIB Size Reduction One can achieve FIB size reduction through virtual aggregation as explained in Paul Francis' draft. [I-D.francis-intra-va] It is worth pointing out that this approach has been discussed in slightly different forms, e.g. a talk at NANOG 44, and used in practice as various forms of default routes. Li Expires September 30, 2009 [Page 6] Internet-Draft RRG Recommendation March 2009 While reducing the FIB size is a laudable goal, alone it is insufficient in that it does not address the RIB scalability issue. 3.2.2. RIB Size Reduction EDITOR'S NOTE: Lixia to propose text here. 3.3. The Herrin Taxonomy As part of the mailing list discussion, the group constructed a more detailed taxonomy of possible architectures, described as a series of strategies. 3.3.1. Strategy A Local routing is based on an address, which functions as a GUID, SID component and local locator, but have each packet flow through an encoder which attaches a RLOC before the packet enters the internetwork core. Routing within the core is based on the RLOC. Only ISPs with significant interconnection have their own RLOCs. Fewer than 10,000 such "core ISPs" exist today and the number is growing much more slowly than the routing table overall. Once the packet reaches the network identified by the RLOC, local routing by address takes over for final delivery. Distribute RLOCs through the core via a typical distance-vector or link-state routing protocol. 3.3.1.1. Variants A1a Each core ISP has one RLOC. The RLOC's existence and reachability is flooded to the rest of the core. A1b Each core ISP has a small number of RLOCs for TE. The RLOCs' existence and reachability is flooded to the rest of the core. A1c Each core ISP has an aggregated set of RLOCs which it may hierarchically assign to customers downstream and/or disaggregate for TE. The aggregated RLOC's existence and reachability is flooded to the rest of the core. 3.3.1.2. Mapping approaches A2a Addresses are statically mapped to RLOCs. Map entries are periodically pushed towards a central or distributed registry. The full list is periodically downloaded to the encoders which add RLOCs to the packets. Li Expires September 30, 2009 [Page 7] Internet-Draft RRG Recommendation March 2009 A2b Addresses are dynamically mapped to RLOCs. Map entries are pushed towards a central or distributed registry as they change. The registry pushes all incremental changes in near-real time to all encoders which add RLOCs to the packets. A2c Addresses are dynamically mapped to RLOCs. Map entries are pushed towards a central or distributed registry as they change. Encoders request and briefly cache individual mappings from the registry as needed. 3.3.1.3. Failure handling approaches Link failures in the Internet core cause the RLOCs to be rerouted with no change to the address to RLOC mapping. A3a RLOC encoders detect when particular RLOCs are no longer reachable at all and fall back on secondary RLOCs for a particular address. Encoders rely on active failure messages from some system in the RLOC-specified network to indicate that a host is no longer available via that RLOC, causing them to fall back on secondary RLOCs for that host. A3b Link failures which prevent parts of the RLOC's network from reaching a destination host or set of hosts it serves cause an external analysis element to make a dynamic change to the address- RLOC map, depreferencing or removing the affected RLOC. The external analysis element may be under the control of the end-user destination network, the RLOC network or a third party under contract to one of them. 3.3.1.4. Compatibility approaches A4a Create a new IP protocol. The new protocol would not be compatible with IPv4 and IPv6. A4b Modify the IP protocol. The modified protocol would not be compatible with IPv4 and IPv6 as deployed. A4c Standard IPv4 and IPv6 packets are tunnelled while they transit the Internet core. Path-MTU issues are handled by setting an Internet-wide maximum packet size enforced by the encoders and assuring that all core links support that size. A4d Standard IPv4 and IPv6 packets are tunnelled while they transit the Internet core. Path-MTU issues are handled by returning packets which breach the MTU while in the core back to the encoder who must act as a proxy by returning a sensible packet-too-big message to the originating host. Li Expires September 30, 2009 [Page 8] Internet-Draft RRG Recommendation March 2009 A4e The IPv6 address space is partitioned into end-user address space and Internet core address space. The address to RLOC map is symmetric. Part of the IPv6 end-user address is swapped for the RLOC when the packet enters the Internet core and then restored when it leaves the Internet core. Use a different A4 variant for IPv4. A4f The IPv6 flow label or some other component(s) of the IPv6 header are used to contain the RLOC. The flow label is set before the packet enters the core. Non-local packets are routed based on the flow label. Use a different A4 variant for IPv4. A4g Steal bits from other functions in the IPv4 header (e.g. checksum) to make space for an RLOC. Discard those components and set the RLOC when the packet enters the core. Restore the original bits when the packet leaves the core. Use a different A4 variant for IPv6. 3.3.1.5. Core routing methods A5a Distribute RLOCs through the Internet core via BGP. A5b Distribute RLOCs through the Internet core via a new distance- vector protocol. A5c Distribute RLOCs through the Internet core via a link-state protocol. 3.3.1.6. Major criticisms There don't appear to be any genuinely clean ways of implementing strategy A. Handling path-MTU is a usually problem since the packets in the core are different than the origin host would recognize. Extra bandwidth is consumed by the ingress tunnel router figuring out whether the egress tunnel router is still available and functioning. Border filtering of source addresses becomes problematic. Deployment may require heavy weight "for the public good" relays in the non-upgraded part of the Internet to facilitate migration. During the transition period, it appears difficult to remove legacy prefixes from the global routing table. The best that can be done is to advertise aggregates of legacy prefixes from the relays. This may have an impact on stretch. Li Expires September 30, 2009 [Page 9] Internet-Draft RRG Recommendation March 2009 3.3.2. Strategy B Assign hierarchically aggregatable locators to every host. Assign multiple locators to each host such that in the network topology hosts appear as stubs in multiple locations instead of forming distant connections in the graph. Assign one aggregated set of locators to each core ISP where a core ISP is one which has at least half a dozen major transit or peering links. Flood the aggregated locator's existence and reachability to the rest of the core. Having reduced the network topology to something relatively close to a hierarchy, perform plain old hierarchical aggregation on the locators. Add and remove locators to each host dynamically during operation as needed to reflect changes in the nearby network hierarchies. Attach source and destination locators when the packet leaves the host. Route first by source then by destination locator: move up the source network hierarchy until you can move laterally toward the destination locator in a permissioned manner. Identifier to locator maps are pushed from the host towards a distributed registry as they change. Hosts request and temporarily cache individual mappings from the registry as needed. 3.3.2.1. Locator variants B1a A hierarchically aggregated locator is dynamically assigned to each host from each upstream path. Each router receives a less specific prefix from upstream and assigns a more specific prefix downstream. Link state changes in the path to the core are satisfied by renumbering instead of rerouting: the host abandons the locator hierarchically associated with the old path. If a new path is available, the host acquires a locator hierarchically associated with the new path. B1b A locator is an administratively-assigned loose source route instead of a single address. The first address in the loose source route is a universally-known waypoint router. The last address is the final destination. Link state changes in the path to the core are satisfied by rerouting in the appropriate routing domain when possible. If rerouting in the affected domain is not possible, the host abandons the impacted locator. B1c Semi-hierarchical locators are administratively or automatically assigned. Local reconnection during link state changes is accomplished with rerouting instead of renumbering. Li Expires September 30, 2009 [Page 10] Internet-Draft RRG Recommendation March 2009 3.3.2.2. Identifier variants B2a Each host has a single identifer to which the locators are attached. This identifier is used by the layer-4/5 and higher protocols to compose the SID. B2b Each service provided by a host has a globally unique, hierarchical identifier to which the locators are attached. Clients initiating communication with that service negotiate a SID which is unique only within the scope of that service. 3.3.2.3. Major criticisms 1. This strategy is probably not compatible with UDP or TCP though B1a/c could be compatible with IPv6's layer 3. The replacement layer-4/5 protocols should also be coaxable to run on top of IPv4's layer 3 in the not-yet-upgraded part of the network. 2. How do firewalls work if the locators are constantly in flux in B1a? 3. How is theft of service avoided in B1b? 3.3.3. Strategy C Suppress distant routes by aggregating them into sets expected to be available in a given direction. Because locator reachability info is not flooded, the routing tables each router must deal with are relatively small. 3.3.3.1. Variants C1 Aggregate locators based on geography. All nodes within some geographic boundary are assigned the same locator. Routers move packets to any adjacent router deemed to be "closer" to the locator in question. 3.3.3.2. Major criticisms No one has been able to construct a proposal under strategy C without introducing constraints that are fundamentally incompatible with the Internet's economic model. For example, geographic aggregation has been shown to have uncorrectable theft-of-service anomalies in networks as small as 8 autonomous systems and two geographic areas. Fundamentally, geographic aggregation requires that there be a per- region interconnect that functions as the deaggregation point for the region's traffic. Funding such an interconnect and compelling the Li Expires September 30, 2009 [Page 11] Internet-Draft RRG Recommendation March 2009 affected ISPs to participate in the interconnect requires external third party coercive controls. 3.3.4. Strategy D Use plain old BGP for the RIB. Algorithmically compress the FIB in each router. 3.3.4.1. Variants D1a Aggregate any adjacent routes that have the same next hop. D1b Insert a /0 route into the FIB which goes to the most popular next hop for all the routes in the RIB. Step to the /1 level. For each /1, if most of the routes in the RIB within that /1 go to a different next hop than the longest route above (the /0 route), add that /1 route to the FIB. Step to the /2 level. Repeat until all routes in the RIB go to the correct next hop in the FIB. Unrouted space is treated as "don't care": it will route wherever the algorithm happens to drop it and will rely on the TTL to take packets off the network. 3.3.4.2. Major criticisms 1. The RIB can grow to up to an order of magnitude larger than the FIB before it hits the wall too. One order of magnitude doesn't gain us multihoming for small office/home office sites. 2. FIBs towards the edge should aggregate well with this strategy but there's no evidence to support a conclusion that they'd aggregate well deep in the core. 3.3.5. Strategy E Make no routing architecture changes. Instead, create a billing system through which the ISPs running core routers are paid by the ISPs announcing prefixes. Let economics suppress growth to a survivable level. 3.3.5.1. Variants E1a Everybody pays the RIRs. the RIRs pay the router operators. E1b Private negotiation between parties. Li Expires September 30, 2009 [Page 12] Internet-Draft RRG Recommendation March 2009 E1c Assisted private negotiation where router operators can offer standardized contracts to carry prefixes and prefix announcers can accept groups of identical contracts via an automated third-party payment system moving funds between the two easily. 3.3.5.2. Major criticisms 1. If it could be done without creating massive boondoggle, why hasn't it been done already? This has been discussed previously and there are no obvious mechanisms to put such a system in place without having a central authority for the Internet. 2. This means giving up on a solution that genuinely enables users and accepting one that merely keeps the Internet viable. 3.3.6. Strategy F Do nothing. (See [RFC1887] Section 4.4.1) 3.3.6.1. Major criticisms It costs "everybody else" a grand total of at least $6000 per year for each prefix you announce. [BGPCost] When we give away that $6000 of value for free, it inevitably creates a "tragedy of the commons" problem. Given that the research group is chartered to 'do something', this alternative does not fit within the charter. 3.3.7. Strategy G Change the topology so that all hosts attach to only one ISP using IPv6 and the ISP's single set of provider assigned addresses. (Actual result of [RFC1887] Section 4.4.3) 3.3.7.1. Major criticisms This strategy wasn't accepted by the operations community because the IPv6 architecture makes renumbering every bit as hard as in IPv4 and the multihoming described in [RFC1887] Section 4.4.3 does not appear to actually work. 4. Recommendations Li Expires September 30, 2009 [Page 13] Internet-Draft RRG Recommendation March 2009 4.1. No manual renumbering of end hosts There is clear consensus in the group that renumbering of sites must not require manual intervention on a per-host basis. This does not scale adequately from a management cost structure. This effectively eliminates solutions that require that hosts have only a single locator and renumber on topological changes, or if hosts maintain multiple locators manually. This implies that transport solutions (Section 3.1.1) are unacceptable unless coupled with another mechanism that would automate the distribution and management of host renumbering, which appears to be a major undertaking all on its own. Further, variants of Strategy B (Section 3.3.2) that require manual locator assignment are similarly unacceptable, as are other solutions that require manual locator assignment, such as Strategy D (Section 3.3.4), Strategy E (Section 3.3.5), Strategy F (Section 3.3.6), and Strategy G (Section 3.3.7). Some further work on improving host renumbering can be found in [I-D.carpenter-renum-needs-work]. 4.2. Future progress The RRG should continue to prune the solution space presented here, attempting to find the overall maximally acceptable solution within the bounds and constraints that have been presented. Whenever possible the research group will continue to discuss architectural concepts and make architectural recommendations rather than becoming embroiled in detailed engineering implementation discussions. The RRG should present a final recommendation by March, 2010. 5. Acknowledgements This document represents a small portion of the overall work product of the Routing Research Group, who have developed all of these architectural approaches and many specific proposals within this solution space. In particular, Bill Herrin has been instrumental in constructing his taxonomy (Section 3.3), with the input of the entire community. This has been pivotal in helping to focus the discussions of the group. We would also like to thank Joel Halpern for his insights and comments. Li Expires September 30, 2009 [Page 14] Internet-Draft RRG Recommendation March 2009 6. IANA Considerations This memo includes no requests to IANA. 7. Security Considerations All solutions are required to provide security that is at least as strong as the existing Internet routing and addressing architecture. 8. References 8.1. Normative References [I-D.irtf-rrg-design-goals] Li, T., "Design Goals for Scalable Internet Routing", draft-irtf-rrg-design-goals-01 (work in progress), July 2007. [I-D.narten-radir-problem-statement] Narten, T., "Routing and Addressing Problem Statement", draft-narten-radir-problem-statement-03 (work in progress), March 2009. [RFC1887] Rekhter, Y. and T. Li, "An Architecture for IPv6 Unicast Address Allocation", RFC 1887, December 1995. 8.2. Informative References [BGPCost] Herrin, W., "What does a BGP Route cost?", . [I-D.carpenter-renum-needs-work] Carpenter, B., Atkinson, R., and H. Flinck, "Renumbering still needs work", draft-carpenter-renum-needs-work-02 (work in progress), February 2009. [I-D.francis-intra-va] Francis, P., Xu, X., and H. Ballani, "FIB Suppression with Virtual Aggregation", draft-francis-intra-va-00 (work in progress), February 2009. Li Expires September 30, 2009 [Page 15] Internet-Draft RRG Recommendation March 2009 Author's Address Tony Li (editor) Ericsson 300 Holger Way San Jose, CA 95134 USA Phone: +1 408 750 5160 Email: tony.li@tony.li Li Expires September 30, 2009 [Page 16]