Ondrej Zajicek wrote:
On Thu, Jul 07, 2011 at 03:12:15PM +0400, Alexander V. Chernikov wrote:
It depends on FIB/tricks implementation :) Actually, I'm trying to discuss those implementation details.
At the moment we have at least the following non-standard things in the world of routing: * VPNv4 address family (RFC 4364) (8-bytes route distinguisher, 4 byte IPv4 address) * VPNv6 address family (RFC 4659) (8-byte route distinguisher, 16 byte IPv6 address) * MPLS address family (RFC 3032) (size varies, it may be 16 bytes or more since label stack depth is implementation-specific. for example, sizeof(sockaddr_mpls) is ~80 bytes for my implementation)
So you are developing doing some MPLS for BIRD development [*]? This is interesting. Definitely we could merge it when it will be ready. Before talking about some implementation details i would like to know your overall idea of how MPLS could be integrated to BIRD and how it will interact with other BIRD parts. It seems that MPLS is a bit different from traditional routing in several ways, so it is not straightforward (for example, how MPLS 'routes' will be represented in BIRD core, in struct rte like common routes? In separate tables from IPv4/v6 routes?) What are interactions (in BIRD) between MPLS and IPv4/v6 routing? What new concepts have to be introduced?
[*] Rhetorical question, i found http://freebsd.mpls.in/
To show "overall" view we have to describe what we will add and what will be required from BIRD first. First of all, mpls operates labels (20-bit number). Labels can be assigned to different entities (IPv4/IPv6 prefix, for example). Label is associated with action to do with this packet and has _local_ significance. The following actions are defined: * POP (pop one label from label stack) * PUSH (add one or more label to existing packet) * SWAP (replace top label) Small picture illustrating: http://upload.wikimedia.org/wikipedia/commons/e/eb/MPLS-swapping-071218.JPG All packets entering MPLS network are prepended by (one or more) MPLS labels (1). Router doing this is called Ingress LSR (Label Switch Router) and is PE (Provider Edge) router. After that, packet travels in MPLS networks via P (Provider) routers routed by its label (getting rewritten on every router) (2). In the end, packet exits from MPLS network on Egress LSR which is PE router, too (3). Appropriate signaling is needed to permit all this happen. Label exchange can be done different ways, for example: * LDP (RFC 5036), easy p2mp protocol like OSPF * RSVP-TE (RFC 3209) extension to RSVP, focused on provider features (Qos, fast rerouting, tunnels for explicit traffic flow, ..) * MPBGP (RFC 4364) BGP extension carrying labels and prefixes via extended communities in VPNv4/VPNv6 address family I've got more or less (actually less) working LDP implementation at the moment. MPLS labels can (and will be!) stacked together. This is used to provide services on MPLS network: top label is mostly used to reach destination PE router in MPLS cloud, and upper label(s) are used as service identification. I will describe L3 VPN setup from BIRD point of view. Very good and easy VPN explanation (using RSVP, but this doesn't matter) : http://www.ist-nobel.org/Nobel2/imatges/L3VPN_Training_course.pdf * ABSTRACT VIEW Imagine Provider network with P and PE routers. IGP is OSPF and LDP is enabled on all appropriate interfaces. OSPF is running in GRT (Global Routing Table), LDP connects to this table, too. LDP establishes relationship with all routers (exactly like OSPF does) and begin exchanging LMAPs (label mappings) (map FEC (forwading equivalent class, IPv4/IPv6 prefix for simplicity) to some number (label)). Every router generate LMAPs for every prefix in its GRT. After LMAP for some prefix is received and verified we need to notify kernel that route to given prefix is MPLS-enabled (case 1). Additionally, we assign local label to that prefix and install MPLS label with IGP nexthop for this prefix into kernel (AF_MPLS "route") (case 2). There are some special labels with pre-defined meaning. Label 0, for example is called "IPv4NULL". Router receiving packet with this label pops it and, if there is last label on stack assumes packet data to be IPv4 packet. Usual IP routing is the used to send packet. Imagine now we have a customer asking for L3 VPN for its 3 sites connected to our ISP routers PE1, PE2 and PE3. We now configure separate routing table on those 3 PE routers. Some globally unique RD (Route distinguisher) has to be assigned to this VPN instance (assigned by user). We than have to convert routes received from new routing table to VPNv4 routes with some custom attributes (stored in ext communities) containing RD, label assigned to this route and vice versa. We also have to notify kernel (update IP route and add AF_MPLS "route") for every prefix we need to. As a result, we will get the following picture (prefixes/label are random) on every PE router: route table "new" Prefix RD Router LABEL 192.168.1.0/24 31337:1 PE1 -- 192.168.2.0/24 31337:1 PE2 PUSH {31, 47} 192.168.3.0/24 31337:1 PE3 PUSH {44, 35} 192.168.13.0/24 31337:1 PE3 PUSH {44, 36} * BIRD USER VIEW table new; protocol ospf ospf0 { # some OSPF configuration in GRT } protocol ldp { export all; label range 20 4000; interface "em*" {}; interface "vlan*" {}; } protocol bgp bgp0 { description "Link to RR"; mpls vpn; # Some usual configuration } protocol l3vpn { table new; rd 31337:1; # Some import/export filters, by default - import # all routes with RT (route target) equal to RD # and export all routes with RT equal to RD } protocol direct { table new; interfaces "vlan136"; } * UNDER THE HOOD *** KERNEL INTERACTION *** Case 1: Route update can happen differently: we can install updated route IFF * LDP label exists * IGP nexthop is one of advertised LDP neighbour nexthops. LDP LMAP can arrive before or after IGP announce, so there is 2 different cases: 1) Prefix already exists from some IGP and LMAP arrives. Here we can find appropriate kernel instance and feed exactly the same route with new attribute (EAP_ADDITIONAL) containing MPLS sockaddr. We can, of course, call rt_update stuff, but: * Route is not considered better if some extended attributes are changed at the moment * There is no need to call all other protocols since they should not be interested in such update Direct updating seems much more appropriate 2) LMAP already exists and rt_notify is called. At the moment it is not possible for a protocol to alter announce: rt_notify calling order cannot be predicted, import_control has only local significance. Some pre-announce hook should be added permitting all interested protocols to add their attributes, at least. (we will insert EAP_ADDITIONAL attribute here) Case 2: This is more tricky. We can handle LIB (Label Information Database) as internal hash table, the main problem is kernel interaction. We can handle this either * adding some private hook to kernel (since there is no need to notify other protocols even in case of LDP + RSVP-TE (separate label space should be used). However, dumping (for the purpose of cleaning) AF_MPLS table requires another hook * By upgrading FIB / rtable: If (from the point of user) config tables will be not AF_bound (e.g. IPv4+IPv6) we will have to do enhance FIB api. My vision is the following: * make fib AF_ bound, specifying AF and sizeof(object) at fib_init (or fib2_init) * pass pointers to all fib_* related functions instead of addresses * do compare by memcpy() for searching (and use AF-dependent hash based on value passed in _init) * Pass AF in appropriate protocol hooks * Change struct network (move rte *routes up) to permit adding some dynamic-sized address after struct fib_node * Each rtable contains fib pointers to supported AFs Using this approach we can send route update by simply announcing label in GRT table with AF_MPLS *** PROTOCOL INTERACTION **** We have to change paradigm "All protocols are equal" to "All protocols are equal, but some protocols are more equal than others" We need some sort of API which permits to call some protocol-specific hook for given protocol type in given rtable. This is needed due to * LDP -> kernel protocol invocation * L3vpn label requests from LDP (see below) Alternatively, post-configure protocol hook should be added (current postconfigure is actually post-successfull-parse-and-config-structure-filled hook) to permit updating protocol pointers after config change Assuming multi protocol rtables each L3vpn instance subscribes on VPNv4/VPNv6 GRT fib instances and listens for routes announced by bgp sessions with "mpls vpn" keyword. (Some filters work should be done to permit filtering by RT) After route passed filtering l3vpn instance retrieves actual IPv4/IPv6 prefix and remote label, does recursive route lookup, requests local label from LDP and installs/updates kernel routes. Very much like pipe protocol In case of new route appearing into l3vpn base table ("new") it passes export filter, packs into VPNv4/VPNv6 family, another local label gets requested from LDP and finally route gets announced into appropriate VPN family in GRT. * PORTABILITY Actually, even all [known] mpls label,sockaddr and other definitions are not portable across platforms: Linux one (linux-mpls project): http://mpls-linux.git.sourceforge.net/git/gitweb.cgi?p=mpls-linux/mpls-linux... OpenBSD one: http://fxr.watson.org/fxr/source/netmpls/mpls.h?v=OPENBSD NetBSD one: http://fxr.watson.org/fxr/source/netmpls/mpls.h?v=NETBSD FreeBSD (from my implementation): http://freebsd.mpls.in/svn/filedetails.php?repname=FREEBSD+MPLS&path=%2Frele... rtsock is more or less the same for *BSD netlink is different, as usual Some additional (system-dependent) run-time kernel configuration is required at least in Linux and FreeBSD. I'm implementing FreeBSD part only (however I'll try to do as much portability as I can) Least common denominator we can get: * More flexible API permitting RR/MPLS * BGP VPN extensions * LDP implementation * L3VPN implementaion * Protocols interaction