Merging bird and bird6
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello list! Are there any plans to move from different daemons to single one? - From user point of view: We will live with both v4+v6 for the next 10+ years, having all dynamic routing in single place with single CLI is much more convenient. - From developer point of view: At the moment all code is based on ip_addr address type which is defined to be v4 OR v6 at compile time. FIBs API assume the same: IPv4 OR IPv6. Speaking about advanced (mostly MPLS-related) features (VPNv4, LDP, MP-BGP in general) - they all need fibs which are not bound to specific address family. At the moment developer has to implement such tables himself. using different processes for different AF makes some MP-BGP aspects very hard to implement. Protocols interaction is also bound to fibs. Even calling protocol hooks is bound to ip_addr and prefix. For example, I'm stuck in sending MPLS labels into kernel at the moment. Of course, I can add a custom hook to kernel protocol but than I will have to support another custom patch which should be avoided. Maybe it is time to start a discussion? I can supply patches if there is any chance for review/commit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4UzxwACgkQwcJ4iSZ1q2lJJQCfWNRQZF8/8Rtk1PmYVLZ2MhZl BJMAoKI1aYjuvafYnGP1rePFIZ7X1+ub =2/AD -----END PGP SIGNATURE-----
On Thu, 2011-07-07 at 01:09 +0400, Alexander V. Chernikov wrote:
Are there any plans to move from different daemons to single one?
Better would be to split them out onto a message bus with a common controlling CLI interface. That way they can talk to each other while maintaining process separation. It is very useful indeed to have multiple calculations going on at the same time without the risk of one of them blowing up and destroying the whole lot.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Neil Wilson wrote: > On Thu, 2011-07-07 at 01:09 +0400, Alexander V. Chernikov wrote: > >> Are there any plans to move from different daemons to single one? >> > > Better would be to split them out onto a message bus with a common > controlling CLI interface. > > That way they can talk to each other while maintaining process > separation. It is very useful indeed to have multiple calculations going > on at the same time without the risk of one of them blowing up and > destroying the whole lot. Actually, if we began talk about splitting into multiple daemons because of multiple calculations and protocol bugs causing entire process crash - - better to discuss splitting based not on address family but protocol or protocol-instance-base. Every architecture has its own pros and cons. Direct function calls for any interested protocol is faster than doing syscalls for every unix socket reads/writes and consumes less CPU resources. Large route updates/reads are addressed in bird by splitting them into smaller parts permitting other protocols doing their IO Anyway, I'm talking about improving parts in _current_ architecture, not about changing architecture (protocols exchange model) at all. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4VcecACgkQwcJ4iSZ1q2nUHwCgopUrpeRbCCljQ5+JFqfDbnuo pmkAnjxo041Yep0mgEh6RcZknEiqdsG3 =5Evq -----END PGP SIGNATURE-----
On Thu, Jul 07, 2011 at 01:09:48AM +0400, Alexander V. Chernikov wrote:
Hello list!
Are there any plans to move from different daemons to single one?
From user point of view: We will live with both v4+v6 for the next 10+ years, having all dynamic routing in single place with single CLI is much more convenient.
From developer point of view: At the moment all code is based on ip_addr address type which is defined to be v4 OR v6 at compile time. FIBs API assume the same: IPv4 OR IPv6.
My idea [*] about the future of IPv4/IPv6 split is to allow IPv4 addresses be embedded in IPv6 ip_addr (probably using IPv4-mapped address prefix). That would allow integration with minimal changes (probably just some UI changes, some FIB integration and some tricks with dual compilation of one-AF protocols). Bird4 would still be IPv4-only (with ip_addr of size 4 B) mainly for users with big route server deployments, but bird6 would support both IPv4 and IPv6. Is this model consistent with your requirements / way of development? [*] originally suggested by Martin Mares. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ondrej Zajicek wrote:
On Thu, Jul 07, 2011 at 01:09:48AM +0400, Alexander V. Chernikov wrote:
Hello list!
Are there any plans to move from different daemons to single one?
From user point of view: We will live with both v4+v6 for the next 10+ years, having all dynamic routing in single place with single CLI is much more convenient.
From developer point of view: At the moment all code is based on ip_addr address type which is defined to be v4 OR v6 at compile time. FIBs API assume the same: IPv4 OR IPv6.
My idea [*] about the future of IPv4/IPv6 split is to allow IPv4 addresses be embedded in IPv6 ip_addr (probably using IPv4-mapped address prefix). That would allow integration with minimal changes (probably just some UI changes, some FIB integration and some tricks with dual compilation of one-AF protocols).
Bird4 would still be IPv4-only (with ip_addr of size 4 B) mainly for users with big route server deployments, but bird6 would support both IPv4 and IPv6.
Is this model consistent with your requirements / way of development? It depends on FIB/tricks implementation :) Actually, I'm trying to discuss those implementation details.
At the moment we have at least the following non-standard things in the world of routing: * VPNv4 address family (RFC 4364) (8-bytes route distinguisher, 4 byte IPv4 address) * VPNv6 address family (RFC 4659) (8-byte route distinguisher, 16 byte IPv6 address) * MPLS address family (RFC 3032) (size varies, it may be 16 bytes or more since label stack depth is implementation-specific. for example, sizeof(sockaddr_mpls) is ~80 bytes for my implementation) First two are purely BGP (RR or MPLS VPN PE client) internal tables and can be rolled within BGP protocol only. MPLS is more tricky since it operates numbers (labels) instead of IP addresses. additionally, label is only a key to another data (action and label stack to be imposed). We also need to install it to kernel somehow. For kernel it us just usual sockaddr structure, but passing it to krt_* function is not so easy. I'm thinking about a bit more general approach for FIB tables: All the time (except sh route CLI command) we need exact match only. If, for example, we: * introduce fib2_* (or mpfib_*) api specifying address family and sizeof() for "address" * use pointers instead of passing arguments by value in (at least fib2_* api) and all protocol hooks * do compare by memcpy() for searching (and use AF-dependent hash based on value passed in _init) life will be easier for non-IPv4/IPv6 AF and the first step towards ipv4+ipv6 merge will be done
[*] originally suggested by Martin Mares.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4VlI8ACgkQwcJ4iSZ1q2mnigCdHOMAnFIntZ/BJ+U7cuFalnZr v4gAnjeY00dyTicAxx1YVpRCqzuvm1hL =bP8v -----END PGP SIGNATURE-----
On Thu, Jul 07, 2011 at 03:12:15PM +0400, Alexander V. Chernikov wrote:
It depends on FIB/tricks implementation :) Actually, I'm trying to discuss those implementation details.
At the moment we have at least the following non-standard things in the world of routing: * VPNv4 address family (RFC 4364) (8-bytes route distinguisher, 4 byte IPv4 address) * VPNv6 address family (RFC 4659) (8-byte route distinguisher, 16 byte IPv6 address) * MPLS address family (RFC 3032) (size varies, it may be 16 bytes or more since label stack depth is implementation-specific. for example, sizeof(sockaddr_mpls) is ~80 bytes for my implementation)
So you are developing doing some MPLS for BIRD development [*]? This is interesting. Definitely we could merge it when it will be ready. Before talking about some implementation details i would like to know your overall idea of how MPLS could be integrated to BIRD and how it will interact with other BIRD parts. It seems that MPLS is a bit different from traditional routing in several ways, so it is not straightforward (for example, how MPLS 'routes' will be represented in BIRD core, in struct rte like common routes? In separate tables from IPv4/v6 routes?) What are interactions (in BIRD) between MPLS and IPv4/v6 routing? What new concepts have to be introduced? [*] Rhetorical question, i found http://freebsd.mpls.in/ -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek wrote:
On Thu, Jul 07, 2011 at 03:12:15PM +0400, Alexander V. Chernikov wrote:
It depends on FIB/tricks implementation :) Actually, I'm trying to discuss those implementation details.
At the moment we have at least the following non-standard things in the world of routing: * VPNv4 address family (RFC 4364) (8-bytes route distinguisher, 4 byte IPv4 address) * VPNv6 address family (RFC 4659) (8-byte route distinguisher, 16 byte IPv6 address) * MPLS address family (RFC 3032) (size varies, it may be 16 bytes or more since label stack depth is implementation-specific. for example, sizeof(sockaddr_mpls) is ~80 bytes for my implementation)
So you are developing doing some MPLS for BIRD development [*]? This is interesting. Definitely we could merge it when it will be ready. Before talking about some implementation details i would like to know your overall idea of how MPLS could be integrated to BIRD and how it will interact with other BIRD parts. It seems that MPLS is a bit different from traditional routing in several ways, so it is not straightforward (for example, how MPLS 'routes' will be represented in BIRD core, in struct rte like common routes? In separate tables from IPv4/v6 routes?) What are interactions (in BIRD) between MPLS and IPv4/v6 routing? What new concepts have to be introduced?
[*] Rhetorical question, i found http://freebsd.mpls.in/
To show "overall" view we have to describe what we will add and what will be required from BIRD first. First of all, mpls operates labels (20-bit number). Labels can be assigned to different entities (IPv4/IPv6 prefix, for example). Label is associated with action to do with this packet and has _local_ significance. The following actions are defined: * POP (pop one label from label stack) * PUSH (add one or more label to existing packet) * SWAP (replace top label) Small picture illustrating: http://upload.wikimedia.org/wikipedia/commons/e/eb/MPLS-swapping-071218.JPG All packets entering MPLS network are prepended by (one or more) MPLS labels (1). Router doing this is called Ingress LSR (Label Switch Router) and is PE (Provider Edge) router. After that, packet travels in MPLS networks via P (Provider) routers routed by its label (getting rewritten on every router) (2). In the end, packet exits from MPLS network on Egress LSR which is PE router, too (3). Appropriate signaling is needed to permit all this happen. Label exchange can be done different ways, for example: * LDP (RFC 5036), easy p2mp protocol like OSPF * RSVP-TE (RFC 3209) extension to RSVP, focused on provider features (Qos, fast rerouting, tunnels for explicit traffic flow, ..) * MPBGP (RFC 4364) BGP extension carrying labels and prefixes via extended communities in VPNv4/VPNv6 address family I've got more or less (actually less) working LDP implementation at the moment. MPLS labels can (and will be!) stacked together. This is used to provide services on MPLS network: top label is mostly used to reach destination PE router in MPLS cloud, and upper label(s) are used as service identification. I will describe L3 VPN setup from BIRD point of view. Very good and easy VPN explanation (using RSVP, but this doesn't matter) : http://www.ist-nobel.org/Nobel2/imatges/L3VPN_Training_course.pdf * ABSTRACT VIEW Imagine Provider network with P and PE routers. IGP is OSPF and LDP is enabled on all appropriate interfaces. OSPF is running in GRT (Global Routing Table), LDP connects to this table, too. LDP establishes relationship with all routers (exactly like OSPF does) and begin exchanging LMAPs (label mappings) (map FEC (forwading equivalent class, IPv4/IPv6 prefix for simplicity) to some number (label)). Every router generate LMAPs for every prefix in its GRT. After LMAP for some prefix is received and verified we need to notify kernel that route to given prefix is MPLS-enabled (case 1). Additionally, we assign local label to that prefix and install MPLS label with IGP nexthop for this prefix into kernel (AF_MPLS "route") (case 2). There are some special labels with pre-defined meaning. Label 0, for example is called "IPv4NULL". Router receiving packet with this label pops it and, if there is last label on stack assumes packet data to be IPv4 packet. Usual IP routing is the used to send packet. Imagine now we have a customer asking for L3 VPN for its 3 sites connected to our ISP routers PE1, PE2 and PE3. We now configure separate routing table on those 3 PE routers. Some globally unique RD (Route distinguisher) has to be assigned to this VPN instance (assigned by user). We than have to convert routes received from new routing table to VPNv4 routes with some custom attributes (stored in ext communities) containing RD, label assigned to this route and vice versa. We also have to notify kernel (update IP route and add AF_MPLS "route") for every prefix we need to. As a result, we will get the following picture (prefixes/label are random) on every PE router: route table "new" Prefix RD Router LABEL 192.168.1.0/24 31337:1 PE1 -- 192.168.2.0/24 31337:1 PE2 PUSH {31, 47} 192.168.3.0/24 31337:1 PE3 PUSH {44, 35} 192.168.13.0/24 31337:1 PE3 PUSH {44, 36} * BIRD USER VIEW table new; protocol ospf ospf0 { # some OSPF configuration in GRT } protocol ldp { export all; label range 20 4000; interface "em*" {}; interface "vlan*" {}; } protocol bgp bgp0 { description "Link to RR"; mpls vpn; # Some usual configuration } protocol l3vpn { table new; rd 31337:1; # Some import/export filters, by default - import # all routes with RT (route target) equal to RD # and export all routes with RT equal to RD } protocol direct { table new; interfaces "vlan136"; } * UNDER THE HOOD *** KERNEL INTERACTION *** Case 1: Route update can happen differently: we can install updated route IFF * LDP label exists * IGP nexthop is one of advertised LDP neighbour nexthops. LDP LMAP can arrive before or after IGP announce, so there is 2 different cases: 1) Prefix already exists from some IGP and LMAP arrives. Here we can find appropriate kernel instance and feed exactly the same route with new attribute (EAP_ADDITIONAL) containing MPLS sockaddr. We can, of course, call rt_update stuff, but: * Route is not considered better if some extended attributes are changed at the moment * There is no need to call all other protocols since they should not be interested in such update Direct updating seems much more appropriate 2) LMAP already exists and rt_notify is called. At the moment it is not possible for a protocol to alter announce: rt_notify calling order cannot be predicted, import_control has only local significance. Some pre-announce hook should be added permitting all interested protocols to add their attributes, at least. (we will insert EAP_ADDITIONAL attribute here) Case 2: This is more tricky. We can handle LIB (Label Information Database) as internal hash table, the main problem is kernel interaction. We can handle this either * adding some private hook to kernel (since there is no need to notify other protocols even in case of LDP + RSVP-TE (separate label space should be used). However, dumping (for the purpose of cleaning) AF_MPLS table requires another hook * By upgrading FIB / rtable: If (from the point of user) config tables will be not AF_bound (e.g. IPv4+IPv6) we will have to do enhance FIB api. My vision is the following: * make fib AF_ bound, specifying AF and sizeof(object) at fib_init (or fib2_init) * pass pointers to all fib_* related functions instead of addresses * do compare by memcpy() for searching (and use AF-dependent hash based on value passed in _init) * Pass AF in appropriate protocol hooks * Change struct network (move rte *routes up) to permit adding some dynamic-sized address after struct fib_node * Each rtable contains fib pointers to supported AFs Using this approach we can send route update by simply announcing label in GRT table with AF_MPLS *** PROTOCOL INTERACTION **** We have to change paradigm "All protocols are equal" to "All protocols are equal, but some protocols are more equal than others" We need some sort of API which permits to call some protocol-specific hook for given protocol type in given rtable. This is needed due to * LDP -> kernel protocol invocation * L3vpn label requests from LDP (see below) Alternatively, post-configure protocol hook should be added (current postconfigure is actually post-successfull-parse-and-config-structure-filled hook) to permit updating protocol pointers after config change Assuming multi protocol rtables each L3vpn instance subscribes on VPNv4/VPNv6 GRT fib instances and listens for routes announced by bgp sessions with "mpls vpn" keyword. (Some filters work should be done to permit filtering by RT) After route passed filtering l3vpn instance retrieves actual IPv4/IPv6 prefix and remote label, does recursive route lookup, requests local label from LDP and installs/updates kernel routes. Very much like pipe protocol In case of new route appearing into l3vpn base table ("new") it passes export filter, packs into VPNv4/VPNv6 family, another local label gets requested from LDP and finally route gets announced into appropriate VPN family in GRT. * PORTABILITY Actually, even all [known] mpls label,sockaddr and other definitions are not portable across platforms: Linux one (linux-mpls project): http://mpls-linux.git.sourceforge.net/git/gitweb.cgi?p=mpls-linux/mpls-linux... OpenBSD one: http://fxr.watson.org/fxr/source/netmpls/mpls.h?v=OPENBSD NetBSD one: http://fxr.watson.org/fxr/source/netmpls/mpls.h?v=NETBSD FreeBSD (from my implementation): http://freebsd.mpls.in/svn/filedetails.php?repname=FREEBSD+MPLS&path=%2Frele... rtsock is more or less the same for *BSD netlink is different, as usual Some additional (system-dependent) run-time kernel configuration is required at least in Linux and FreeBSD. I'm implementing FreeBSD part only (however I'll try to do as much portability as I can) Least common denominator we can get: * More flexible API permitting RR/MPLS * BGP VPN extensions * LDP implementation * L3VPN implementaion * Protocols interaction
On Tue, Jul 12, 2011 at 04:47:06AM +0400, Alexander V. Chernikov wrote:
To show "overall" view we have to describe what we will add and what will be required from BIRD first.
Thanks for the great overview. Sorry for a late answer, it took a while for me to get into MPLS and think about it.
* UNDER THE HOOD *** KERNEL INTERACTION ***
So essentially, there are three kinds of routes: * Standard IP routes with IP nexthops, which we already support. * MPLS routes, keyed by MPLS label and with MPLS action (NHLFE), these form a MPLS routing table (ILM). I will call these MPLS routes. * IP routes with MPLS action, used for encapsulation of incoming IP packets (FTN mapping), these share a routing table with standard IP routes (because depending of which route is chosen packet is either forwarded in a standard way, or encapsulated to MPLS). I will call these encapsulating routes. If i understand correctly your mail, you use some EAP_ADDITIONAL external attribute to represent encapsulating routes and use some new hook to attach these routes by third party protocol. I think this is not a good idea - to be semantically consistent, i think encapsulation routes should be represented by routes with new destination type (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored in new struct rta_mpls (or rta_nhlfe), which would be extension of struct rta (containing struct rta in the first field and NHLFE after that). Such structure could be easily passed as struct rta and functions from rt-attr.c can work with that, with jome some minor modifications (allocating, freeing and printing) dispatched based on dest field. Otherwise, they are very similar to standard IP routes and probably would need just some minor tweaks (and obviously kernel protocol support). Therefore, such encapsulating route should be generated in a standard way as a new route - by rte_update, with LDP (or some other protocol) as true originator (in rta->proto and rta->source). I will comment that later. MPLS routes could use the same struct rta_mpls as encapsulating routes, but struct network (their fib_node) contains MPLS label instead of IP address. As MPLS label is small (and complex action is outside) i don't see any problems in reuse ip_addr prefix. Most things would work without modifications. There should be AF field in struct rtable and struct rte to distinguish routes. Therefore there would be two types of routing tables - IP and MPLS. I don't think it is a good idea to mix these. This may look inconsistent with idea of embedding IPv4 to IPv6, but IP protocols are much more similar, have a natural way to embed one in the other, have similar roles and protocol structure. MPLS routing table could be used to LDP - kernel interaction (routes imported from LDP and exported to kernel). This solves your Case 2 without any hacks.
Case 1: Route update can happen differently: we can install updated route IFF * LDP label exists * IGP nexthop is one of advertised LDP neighbour nexthops.
I think it is possible to handle all these cases and protocol interaction in an elegant way. LDP protocol, instead of just import and export to one table, could be connected to more tables, with different meanings. There are four interactions of LDP protocol - generating MPLS routes, generating encapsulating routes, importing label requests (can be handled as routes) [*] and tracking IGP table (to update nexthops of generated routes). These all can be handled as import or export of routes to proper tables. Standard table connection (to IP table) could be used for import (from LDP) of generated encapsulating routes and export (to LDP) of label requests. Another connection to MPLS table would be used for import (from LDP) of generated MPLS routes, and the last one is used for tracking IGP changes: protocol ldp { export all; # label requests import all; # encapsulating routes mpls import all; # MPLS routes # it is probably pointless to have configurable filters for IGP tracking table t1; # table for import label requests and export encapsulating routes mpls table t2; # table for MPLS routes igp table t1; # table for tracking IGP routes, usually (and by default) the same as main table. } [*] when i wrote that i thought that labels are distributed just by LDP and the purpose of label request is to propagate the label through LDP area. i didn't noticed that BGP/MPLS also distributes labels so they need to know assigned labels. So the idea would need some modifications. (I assume that LDP generates encapsulating routes as a true originator, as i wrote before, not just attaching some attribute to the existing route.) So my idea of your Case 1 scenario is like this: In both subcases (LDP LMAP arrives and internal table with LMAPs changed; rt_notify() on 'tracked IGP connection' is used to signalize that tracked table changed), the same procedure is executed: Internal LMAP table is examined, tracked IGP table is examined. If both are ready (for given prefix), appropriate encapsulating and MPLS routes are generated and propagated using rte_update(), otherwise nothing is generated and the previously generated route is withdrawn (rte_update() with NULL is called) (or perhaps an unreachable route is generated if LMAP is here but IGP route is missing). Simple and elegant. If the encapsulated routes are saved Case 2 scenarions are trivial - just standard updates. There are some tricky parts of IGP tracking - it is problematic to use standard RA_OPTIMAL update for this purpose, because if generated encapsulating routes are imported to the same table, these probably became the optimal ones and IGP routes would be shaded. Solution would be to use RA_ANY, and ignore notifications containing encapsulating routes, similarly 'examining the tracked IGP table' means looking up the fib node and find the best route, ignoring encapsulating ones. For implementation of this behavior, there are two minor changes that needs to be done to the rt table code: First, currently accept_ra_types (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a property of an announce hook (as LDP would have two hooks with RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for both in rte_recalculate should be moved after the route list is updated/relinked. BTW, this whole dependency 'IGP table -> LDP function' is a bit similar to situation with recursive nexthops in IBGP, where IGP change also leads to change of IBGP route nexthop. In IBGP case it is handled automatically by rtable code (see rta_set_recursive_next_hop() discussion in route.h, hostcache and hostentry), LDP situation is a bit different, but perhaps the same mechanism could be extended to call protocol hook instead just update nexthop. This mechanism is useful if protocol waits for a change of a result of some recursive lookup in tracked table. But the LDP situation is much simpler, it just waits for an exact match change in tracked table.
* There is no need to call all other protocols since they should not be interested in such update
Not true, other protocol may have filters that changes answer if you do some changes to route attributes. Ignoring that would lead to inconsistencies in route propagation.
* By upgrading FIB / rtable: If (from the point of user) config tables will be not AF_bound (e.g. IPv4+IPv6) we will have to do enhance FIB api.
My vision is the following: * make fib AF_ bound, specifying AF and sizeof(object) at fib_init (or fib2_init) * pass pointers to all fib_* related functions instead of addresses * do compare by memcpy() for searching (and use AF-dependent hash based on value passed in _init) * Pass AF in appropriate protocol hooks
As i wrote above, if we consider just IP (v4 and v6) and MPLS routes, i think that fixed size fib would be enough. But problems are with VPNvX AFs. Originally i thought that having FIB / rtable with VPNvX routes is not a good idea - these AFs are just some wire representation of multiple independent IP spaces, and we already have better representation of that - just multiple routing tables. Having both these representations seemed unnecessary and would require some conversion between the parts that request the first representation and the parts that request the second one. But not having VPNvX routes is also cumbersome - protocols that uses these have to be bound to multple routing tables through some multiplexer. So it is probably easier to have tables with VPNvX AFs. Therefore, it is probably a good idea to extend FIBs in a way you suggested, with minor details changed. FIB / rtables would be uniform (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6 could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To minimize code changes, struct fib_node would have ip_addr prefix, but might be allocated larger. Because each protocol and each its announce_hook have appropriate role, it is IMHO unnecessary to have AF in protocol hooks, but there should be check whether protocol/announce_hook is connected to appropriate rtable. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ondrej Zajicek wrote:
> On Tue, Jul 12, 2011 at 04:47:06AM +0400, Alexander V. Chernikov wrote:
>> To show "overall" view we have to describe what we will add and what
>> will be required from BIRD first.
>
>
> Thanks for the great overview. Sorry for a late answer, it took
> a while for me to get into MPLS and think about it.
>
Sorry for a late answer, too. ETIME issues :(
>> * UNDER THE HOOD
>> *** KERNEL INTERACTION ***
>
>
> So essentially, there are three kinds of routes:
>
> * Standard IP routes with IP nexthops, which we already support.
>
> * MPLS routes, keyed by MPLS label and with MPLS action (NHLFE), these
> form a MPLS routing table (ILM). I will call these MPLS routes.
>
> * IP routes with MPLS action, used for encapsulation of incoming
> IP packets (FTN mapping), these share a routing table with standard IP
> routes (because depending of which route is chosen packet is either
> forwarded in a standard way, or encapsulated to MPLS). I will call these
> encapsulating routes.
>
>
> If i understand correctly your mail, you use some EAP_ADDITIONAL
> external attribute to represent encapsulating routes and use some new
> hook to attach these routes by third party protocol. I think this is not
> a good idea - to be semantically consistent, i think encapsulation
> routes should be represented by routes with new destination type
> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
> in new struct rta_mpls (or rta_nhlfe), which would be extension of
> struct rta (containing struct rta in the first field and NHLFE after
> that). Such structure could be easily passed as struct rta and functions
> from rt-attr.c can work with that, with jome some minor modifications
> (allocating, freeing and printing) dispatched based on dest field.
> Otherwise, they are very similar to standard IP routes and probably
> would need just some minor tweaks (and obviously kernel protocol support).
>
> Therefore, such encapsulating route should be generated in a standard
> way as a new route - by rte_update, with LDP (or some other protocol)
> as true originator (in rta->proto and rta->source). I will comment
> that later.
Understood. This is much better than calling some protocol hooks directly.
>
>
> MPLS routes could use the same struct rta_mpls as encapsulating routes,
> but struct network (their fib_node) contains MPLS label instead of IP address.
> As MPLS label is small (and complex action is outside) i don't see any problems
> in reuse ip_addr prefix. Most things would work without modifications.
> There should be AF field in struct rtable and struct rte to distinguish
> routes.
Storing AF in rte makes much more sense if we use separate AFs for
inet/inet6.
>
>
> Therefore there would be two types of routing tables - IP and MPLS. I
> don't think it is a good idea to mix these. This may look inconsistent
> with idea of embedding IPv4 to IPv6, but IP protocols are much more
> similar, have a natural way to embed one in the other, have similar
> roles and protocol structure. MPLS routing table could be used to LDP -
> kernel interaction (routes imported from LDP and exported to kernel).
> This solves your Case 2 without any hacks.
So, from user point of view, I define
table xxx; for both ipv4 and IPv6 routes and
mpls table yyy; for MPLS routing table?
There should be base MPLS rtable (mpls_default, for example) as in IP.
We can also add a hack for automatically subscribe protocols for MPLS
routing table by type and other attributes. For example, every LDP
instance gets connected to an MPLS table (default or defined in config).
Kernel protocol instance gets connected to MPLS table only if its IP
table is the default one (GRT) or 'mpls table' keyword is supplied
explicitely. What about VPNv4/VPNv6 ? The same approach?
Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP
/ IPv4-mapped cases)
>
>> Case 1:
>> Route update can happen differently: we can install updated route IFF
>> * LDP label exists
>> * IGP nexthop is one of advertised LDP neighbour nexthops.
>
> I think it is possible to handle all these cases and protocol
> interaction in an elegant way. LDP protocol, instead of just import and
> export to one table, could be connected to more tables, with different
> meanings. There are four interactions of LDP protocol - generating MPLS
> routes, generating encapsulating routes, importing label requests (can
> be handled as routes) [*] and tracking IGP table (to update nexthops of
> generated routes). These all can be handled as import or export of
> routes to proper tables. Standard table connection (to IP table) could
> be used for import (from LDP) of generated encapsulating routes and
> export (to LDP) of label requests. Another connection to MPLS table
> would be used for import (from LDP) of generated MPLS routes, and the
> last one is used for tracking IGP changes:
>
> protocol ldp {
> export all; # label requests
> import all; # encapsulating routes
> mpls import all; # MPLS routes
> # it is probably pointless to have configurable filters for IGP tracking
>
> table t1; # table for import label requests and export encapsulating routes
> mpls table t2; # table for MPLS routes
> igp table t1; # table for tracking IGP routes, usually (and by default) the same as main table.
> }
>
> [*] when i wrote that i thought that labels are distributed just by LDP
> and the purpose of label request is to propagate the label through LDP
> area. i didn't noticed that BGP/MPLS also distributes labels so they
> need to know assigned labels. So the idea would need some modifications.
Not sure this will work. Since t1 is an IP table cases when we need to
request specific label for:
* AToM
* RSVP-TE tunnels
will not work since there are no prefixes that can be mapped to such
request.
> (I assume that LDP generates encapsulating routes as a true originator,
> as i wrote before, not just attaching some attribute to the existing
> route.)
>
> So my idea of your Case 1 scenario is like this:
>
> In both subcases (LDP LMAP arrives and internal table with LMAPs changed;
> rt_notify() on 'tracked IGP connection' is used to signalize that
> tracked table changed), the same procedure is executed:
>
> Internal LMAP table is examined, tracked IGP table is examined. If both
> are ready (for given prefix), appropriate encapsulating and MPLS routes
> are generated and propagated using rte_update(), otherwise nothing is
> generated and the previously generated route is withdrawn (rte_update()
> with NULL is called) (or perhaps an unreachable route is generated if
> LMAP is here but IGP route is missing). Simple and elegant.
.. and in case of label release we should remove label only and keep
original route
> If the encapsulated routes are saved
>
> Case 2 scenarions are trivial - just standard updates.
>
>
> There are some tricky parts of IGP tracking - it is problematic
> to use standard RA_OPTIMAL update for this purpose, because if
> generated encapsulating routes are imported to the same table,
> these probably became the optimal ones and IGP routes would be
> shaded. Solution would be to use RA_ANY, and ignore notifications
> containing encapsulating routes, similarly 'examining the tracked
> IGP table' means looking up the fib node and find the best route,
> ignoring encapsulating ones.
>
> For implementation of this behavior, there are two minor changes that
> needs to be done to the rt table code: First, currently accept_ra_types
> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
> property of an announce hook (as LDP would have two hooks with
> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
> both in rte_recalculate should be moved after the route list
> is updated/relinked.
Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
trivial task and requires internals understanding. Either announce type
should be passed to announce hook or new hook should be added for RA_ANY
event. The latter is more appropriate IMHO since RA_ANY is used by pipe
protocol only. Kernel protocol should track RA_ANY protocol hooks
looking for update source (LDP / RSVP) and re-install appropriate
routes. The only downside is situation when LDP signalling starts faster
than IGP. In that case we will get 3 updates instead of one (at least in
RTSOCK):
* RTM_ADD for original prefix
* RTM_DEL for this prefix (as part of krt_set_notify())
* RTM_ADD for modified prefix
RTM_CHANGE can be used in notify, but still: this gives 2 updates
instead of one.
>
>
> BTW, this whole dependency 'IGP table -> LDP function' is a bit similar
> to situation with recursive nexthops in IBGP, where IGP change also
> leads to change of IBGP route nexthop. In IBGP case it is handled
> automatically by rtable code (see rta_set_recursive_next_hop()
> discussion in route.h, hostcache and hostentry), LDP situation is a bit
> different, but perhaps the same mechanism could be extended to call
> protocol hook instead just update nexthop. This mechanism is useful if
> protocol waits for a change of a result of some recursive lookup in tracked
> table. But the LDP situation is much simpler, it just waits for an exact match
> change in tracked table.
>
>
>> * There is no need to call all other protocols since they should not be
>> interested in such update
>
> Not true, other protocol may have filters that changes answer if you
> do some changes to route attributes. Ignoring that would lead to
> inconsistencies in route propagation.
>
>> * By upgrading FIB / rtable:
>> If (from the point of user) config tables will be not AF_bound (e.g.
>> IPv4+IPv6) we will have to do enhance FIB api.
>>
>> My vision is the following:
>> * make fib AF_ bound, specifying AF and sizeof(object) at fib_init (or
>> fib2_init)
>> * pass pointers to all fib_* related functions instead of addresses
>> * do compare by memcpy() for searching (and use AF-dependent hash based
>> on value passed in _init)
>> * Pass AF in appropriate protocol hooks
>
> As i wrote above, if we consider just IP (v4 and v6) and MPLS routes,
> i think that fixed size fib would be enough. But problems are with
> VPNvX AFs.
>
> Originally i thought that having FIB / rtable with VPNvX routes is not a
> good idea - these AFs are just some wire representation of multiple
> independent IP spaces, and we already have better representation of that
> - just multiple routing tables. Having both these representations seemed
> unnecessary and would require some conversion between the parts that
> request the first representation and the parts that request the second
> one. But not having VPNvX routes is also cumbersome - protocols that
> uses these have to be bound to multple routing tables through some
> multiplexer. So it is probably easier to have tables with VPNvX
> AFs.
>
>
> Therefore, it is probably a good idea to extend FIBs in a way you
> suggested, with minor details changed. FIB / rtables would be uniform
> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6
> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
> minimize code changes, struct fib_node would have ip_addr prefix, but
> might be allocated larger.
Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
enough for holding IPv6 address? This can bump memory consumption for
setups with several full-views significantly.
>
> Because each protocol and each its announce_hook have appropriate role,
> it is IMHO unnecessary to have AF in protocol hooks, but there should be
> check whether protocol/announce_hook is connected to appropriate rtable.
>
To summarize required changes (please correct me):
1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
* rtable
* fib
* rte
3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
to struct fib to hold this value.
4) Move to memcmp() in fib_find / fib_get
5) Set up default rtable for every supported AF. Connect protocol
instances to such default AFs based on protocol types
...
Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
can be used in case of bird used as RR in MPLS network, for example).
Should I supply patches for these? What are your plans about commit
routemap ?
)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk4onmIACgkQwcJ4iSZ1q2lOcwCfT3CcT/bsxIlg1UiiArLWPq4k
w9EAnAzx7YifSgszTpHBcdwAvf01KI7S
=6LzQ
-----END PGP SIGNATURE-----
On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote: > > Therefore there would be two types of routing tables - IP and MPLS. I > > don't think it is a good idea to mix these. This may look inconsistent > > with idea of embedding IPv4 to IPv6, but IP protocols are much more > > similar, have a natural way to embed one in the other, have similar > > roles and protocol structure. MPLS routing table could be used to LDP - > > kernel interaction (routes imported from LDP and exported to kernel). > > This solves your Case 2 without any hacks. > So, from user point of view, I define > table xxx; for both ipv4 and IPv6 routes and > mpls table yyy; for MPLS routing table? Yes. > There should be base MPLS rtable (mpls_default, for example) as in IP. > We can also add a hack for automatically subscribe protocols for MPLS > routing table by type and other attributes. For example, every LDP > instance gets connected to an MPLS table (default or defined in config). > Kernel protocol instance gets connected to MPLS table only if its IP > table is the default one (GRT) or 'mpls table' keyword is supplied > explicitely. What about VPNv4/VPNv6 ? The same approach? Perhaps even default MPLS table should be explicitly configured [*] (as i guess not many BIRD users would use MPLS). Protocols requiring MPLS table would fail if it is not configured, protocol with optional MPLS support (kernel, static?) just do not connect to MPLS in that case. The same approach for VPNvX table. [*] probably like: mpls table XXX default; > Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP > / IPv4-mapped cases) I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for similar purposes in IP stack. But this should not be checked directly in protocols, there should be some macros in lib/ipv6.h for that. > > [*] when i wrote that i thought that labels are distributed just by LDP > > and the purpose of label request is to propagate the label through LDP > > area. i didn't noticed that BGP/MPLS also distributes labels so they > > need to know assigned labels. So the idea would need some modifications. > Not sure this will work. Since t1 is an IP table cases when we need to > request specific label for: > * AToM > * RSVP-TE tunnels > will not work since there are no prefixes that can be mapped to such > request. You are probably right. I originally thought about some specific 'request table' (where requests coded as routes with specific AF), but perhaps there should be used some other mechanism / other protocol hook. But it should be generic enough (some bus, allows at least more 'producers' and perhaps more 'consumers'). > > Internal LMAP table is examined, tracked IGP table is examined. If both > > are ready (for given prefix), appropriate encapsulating and MPLS routes > > are generated and propagated using rte_update(), otherwise nothing is > > generated and the previously generated route is withdrawn (rte_update() > > with NULL is called) (or perhaps an unreachable route is generated if > > LMAP is here but IGP route is missing). Simple and elegant. > .. and in case of label release we should remove label only and keep > original route Yes. > > There are some tricky parts of IGP tracking - it is problematic > > to use standard RA_OPTIMAL update for this purpose, because if > > generated encapsulating routes are imported to the same table, > > these probably became the optimal ones and IGP routes would be > > shaded. Solution would be to use RA_ANY, and ignore notifications > > containing encapsulating routes, similarly 'examining the tracked > > IGP table' means looking up the fib node and find the best route, > > ignoring encapsulating ones. > > > > For implementation of this behavior, there are two minor changes that > > needs to be done to the rt table code: First, currently accept_ra_types > > (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a > > property of an announce hook (as LDP would have two hooks with > > RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for > > both in rte_recalculate should be moved after the route list > > is updated/relinked. > Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a > trivial task and requires internals understanding. Either announce type > should be passed to announce hook or new hook should be added for RA_ANY > event. The latter is more appropriate IMHO since RA_ANY is used by pipe > protocol only. I thought about that when i created RA_ANY and have chosen this approach. Probably best way is just to change rt_notify to have appropriate struct announce_hook as a second argument instead of struct rtable. struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly some protocol-specific data. As (probably) all protocols are in-tree, doing some wide but trivial changes is not a problem. > Kernel protocol should track RA_ANY protocol hooks > looking for update source (LDP / RSVP) and re-install appropriate > routes. I think kernel protocol should use RA_OPTIMAL as usual. This kind of RA_ANY usage is for protocols that export routes to the same table they listen (so 'source' routes would be shaded by their routes). These routes (LDP / RSVP) should have just highest priority. > The only downside is situation when LDP signalling starts faster > than IGP. In that case we will get 3 updates instead of one (at least in > RTSOCK): > * RTM_ADD for original prefix > * RTM_DEL for this prefix (as part of krt_set_notify()) > * RTM_ADD for modified prefix > > RTM_CHANGE can be used in notify, but still: this gives 2 updates > instead of one. No, because RA_ANY is handled strictly before RA_OPTIMAL and routes are propagated synchronously depth-first: OSPF --RA_ANY--> LDP LDP --RA_OPTIMAL--> kernel OSPF --RA_OPTIMAL--> kernel But it is true that this is much dependent on internal implementation of route propagation. The first idea i had was to use separate tables for original and labeled routes (when just RA_OPTIMAL hooks), but that looks too cumbersome for users and ability to push a better route to the same (input) table has other possible usages. > > Therefore, it is probably a good idea to extend FIBs in a way you > > suggested, with minor details changed. FIB / rtables would be uniform > > (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6 > > could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To > > minimize code changes, struct fib_node would have ip_addr prefix, but > > might be allocated larger. > Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large > enough for holding IPv6 address? This can bump memory consumption for > setups with several full-views significantly. It increases memory consumtion, but not so much in a relative view - for each struct network there is at least one struct rte and in both of them there is just one ip_addr and both structures are nontrivial. So this relative increase would be about 1.15-1.2. Really big users would probably keep current splitted setting. > > Because each protocol and each its announce_hook have appropriate role, > > it is IMHO unnecessary to have AF in protocol hooks, but there should be > > check whether protocol/announce_hook is connected to appropriate rtable. > > > > To summarize required changes (please correct me): > 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly) > 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures: > * rtable > * fib > * rte > 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field > to struct fib to hold this value. > 4) Move to memcmp() in fib_find / fib_get > 5) Set up default rtable for every supported AF. Connect protocol > instances to such default AFs based on protocol types 1a) other changes in rte_recalculate() related to propagation (clean up the table before calling RA_ANY hook). 1) and 1a) i will do myself and send you the patch, and also make some trivial example for exporting to the same table. 2) i am not sure if there is a reason to put explicit AF info to struct fib, AF compatibility could be handled on higher level (struct rtable in general, other direct users probably use just one AF). 3) and hashing callback (and perhaps fib_route, but not sure if this is needed). 4) probably encapsulate that to some static inline key_equal() function. 5) see my related note above. Protocol binding to tables should check AFs. more: 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail: > > i think encapsulation > > routes should be represented by routes with new destination type > > (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored > > in new struct rta_mpls (or rta_nhlfe), which would be extension of > > struct rta (containing struct rta in the first field and NHLFE after > > that). Such structure could be easily passed as struct rta and functions > > from rt-attr.c can work with that, with jome some minor modifications > > (allocating, freeing and printing) dispatched based on dest field. > > This rta could be used without changes also for MPLS routes. > Most of this are more or less trivial changes not MPLS-bound (VPNv4/6 > can be used in case of bird used as RR in MPLS network, for example). > Should I supply patches for these? What are your plans about commit > routemap ? I create GIT branch 'mpls' and would merge these patches to that branch soon. When we will have some major release, we could merge 'mpls' branch to master if there is some sufficient usage (i think that even just static and kernel protocol support for MPLS would be a good example usage). Other protocols (LDP, ...) probably should be merged when they are reasonable ready. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 22.07.2011 14:52, Ondrej Zajicek wrote:
> On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
>>> Therefore there would be two types of routing tables - IP and MPLS. I
>>> don't think it is a good idea to mix these. This may look inconsistent
>>> with idea of embedding IPv4 to IPv6, but IP protocols are much more
>>> similar, have a natural way to embed one in the other, have similar
>>> roles and protocol structure. MPLS routing table could be used to LDP -
>>> kernel interaction (routes imported from LDP and exported to kernel).
>>> This solves your Case 2 without any hacks.
>> So, from user point of view, I define
>> table xxx; for both ipv4 and IPv6 routes and
>> mpls table yyy; for MPLS routing table?
>
> Yes.
>
>> There should be base MPLS rtable (mpls_default, for example) as in IP.
>> We can also add a hack for automatically subscribe protocols for MPLS
>> routing table by type and other attributes. For example, every LDP
>> instance gets connected to an MPLS table (default or defined in config).
>> Kernel protocol instance gets connected to MPLS table only if its IP
>> table is the default one (GRT) or 'mpls table' keyword is supplied
>> explicitely. What about VPNv4/VPNv6 ? The same approach?
>
> Perhaps even default MPLS table should be explicitly configured [*] (as i guess
> not many BIRD users would use MPLS). Protocols requiring MPLS table would
> fail if it is not configured, protocol with optional MPLS support (kernel,
> static?) just do not connect to MPLS in that case. The same approach
> for VPNvX table.
>
> [*] probably like: mpls table XXX default;
Maybe it's better to turn on "general" mpls support?
e.g. 'mpls support;' or just 'mpls;' instead of propagating some table
to be default?
>
>> Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP
>> / IPv4-mapped cases)
>
> I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
> similar purposes in IP stack. But this should not be checked directly
> in protocols, there should be some macros in lib/ipv6.h for that.
>
>>> [*] when i wrote that i thought that labels are distributed just by LDP
>>> and the purpose of label request is to propagate the label through LDP
>>> area. i didn't noticed that BGP/MPLS also distributes labels so they
>>> need to know assigned labels. So the idea would need some modifications.
>> Not sure this will work. Since t1 is an IP table cases when we need to
>> request specific label for:
>> * AToM
>> * RSVP-TE tunnels
>> will not work since there are no prefixes that can be mapped to such
>> request.
>
> You are probably right. I originally thought about some specific
> 'request table' (where requests coded as routes with specific AF),
> but perhaps there should be used some other mechanism / other protocol
> hook. But it should be generic enough (some bus, allows at least more
> 'producers' and perhaps more 'consumers').
Okay, i see this as follows:
New rtable hook, service_hook, with uint32_3 bitmask specifying request
classes we are responsible to:
/* Defined classes */
#define RCLASS_LABEL 0x01 /* MPLS label request */
Some request function:
int
request_data(rtable *t, struct service_request *req, void **buf, size_t
*bufsize)
struct service_request {
uint32_t request; /* Single request class set */
uint32_t subclass; /* Subclass specific for request */
proto *p; /* caller protocol */
char data[0]; /* request-specific data follows */
}
function loops thru all registered hooks for given _class_ checking for
reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to
check subclass.
#define SR_OK 0x01 /* Request successful */
#define SR_FAIL 0x02 /* Request failed */
#define SR_NEXT 0x03 /* Request skipped */
#define SR_UNAVAIL 0x04 /* No providers for this request */
As a result, caller get SR_UNAVAIL in case of no providers were able to
serve request or SR_OK|SR_FAIL.
caller can setup buffer itself and pass pointer to pointer to buffer and
pointer to buffer size to function, or request provider to allocate data
for him setting *buf to NULL and bufsize to 0
struct service_reply { /* is returned in reply buffer */
uint32_t request;
uint32_t subclass;
proto *p; /* protocol, providing data */
char data[0]; /* request-specific data */
}
>
>>> Internal LMAP table is examined, tracked IGP table is examined. If both
>>> are ready (for given prefix), appropriate encapsulating and MPLS routes
>>> are generated and propagated using rte_update(), otherwise nothing is
>>> generated and the previously generated route is withdrawn (rte_update()
>>> with NULL is called) (or perhaps an unreachable route is generated if
>>> LMAP is here but IGP route is missing). Simple and elegant.
>> .. and in case of label release we should remove label only and keep
>> original route
>
> Yes.
>
>>> There are some tricky parts of IGP tracking - it is problematic
>>> to use standard RA_OPTIMAL update for this purpose, because if
>>> generated encapsulating routes are imported to the same table,
>>> these probably became the optimal ones and IGP routes would be
>>> shaded. Solution would be to use RA_ANY, and ignore notifications
>>> containing encapsulating routes, similarly 'examining the tracked
>>> IGP table' means looking up the fib node and find the best route,
>>> ignoring encapsulating ones.
>>>
>>> For implementation of this behavior, there are two minor changes that
>>> needs to be done to the rt table code: First, currently accept_ra_types
>>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
>>> property of an announce hook (as LDP would have two hooks with
>>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
>>> both in rte_recalculate should be moved after the route list
>>> is updated/relinked.
>
>> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
>> trivial task and requires internals understanding. Either announce type
>> should be passed to announce hook or new hook should be added for RA_ANY
>> event. The latter is more appropriate IMHO since RA_ANY is used by pipe
>> protocol only.
>
> I thought about that when i created RA_ANY and have chosen this approach.
> Probably best way is just to change rt_notify to have appropriate
> struct announce_hook as a second argument instead of struct rtable.
> struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
> some protocol-specific data. As (probably) all protocols are in-tree,
> doing some wide but trivial changes is not a problem.
>
>> Kernel protocol should track RA_ANY protocol hooks
>> looking for update source (LDP / RSVP) and re-install appropriate
>> routes.
>
> I think kernel protocol should use RA_OPTIMAL as usual. This kind
> of RA_ANY usage is for protocols that export routes to the same
> table they listen (so 'source' routes would be shaded by their
> routes). These routes (LDP / RSVP) should have just highest
> priority.
>
>> The only downside is situation when LDP signalling starts faster
>> than IGP. In that case we will get 3 updates instead of one (at least in
>> RTSOCK):
>> * RTM_ADD for original prefix
>> * RTM_DEL for this prefix (as part of krt_set_notify())
>> * RTM_ADD for modified prefix
>>
>> RTM_CHANGE can be used in notify, but still: this gives 2 updates
>> instead of one.
>
> No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
> are propagated synchronously depth-first:
>
> OSPF --RA_ANY--> LDP
> LDP --RA_OPTIMAL--> kernel
> OSPF --RA_OPTIMAL--> kernel
>
Still I can't understand how exactly I can modify an announced IP route
(still, from FreeBSD kernel point of view encapsulated route is a usual
route with an attribute attached. From Linux point of view this should
be more or less the same since an IP route lookup have to be done for
incoming packet anyway and doing several different lookups is not a best
idea). I've got RA_ANY hook called for a new route (and I should know
that it is actually RA_OPTIMAL without some complex logic!), what I
should do next ?
> But it is true that this is much dependent on internal implementation
> of route propagation. The first idea i had was to use separate
> tables for original and labeled routes (when just RA_OPTIMAL hooks),
> but that looks too cumbersome for users and ability to push a better
> route to the same (input) table has other possible usages.
>
>>> Therefore, it is probably a good idea to extend FIBs in a way you
>>> suggested, with minor details changed. FIB / rtables would be uniform
>>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6
>>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
>>> minimize code changes, struct fib_node would have ip_addr prefix, but
>>> might be allocated larger.
>> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
>> enough for holding IPv6 address? This can bump memory consumption for
>> setups with several full-views significantly.
>
> It increases memory consumtion, but not so much in a relative view - for
> each struct network there is at least one struct rte and in both of them
> there is just one ip_addr and both structures are nontrivial. So this
> relative increase would be about 1.15-1.2. Really big users would
> probably keep current splitted setting.
Okay, it's much easier from developer point of view. If you're not
afraid of your users :)
>
>>> Because each protocol and each its announce_hook have appropriate role,
>>> it is IMHO unnecessary to have AF in protocol hooks, but there should be
>>> check whether protocol/announce_hook is connected to appropriate rtable.
>>>
>>
>> To summarize required changes (please correct me):
>> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
>> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
>> * rtable
>> * fib
>> * rte
>> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
>> to struct fib to hold this value.
>> 4) Move to memcmp() in fib_find / fib_get
>> 5) Set up default rtable for every supported AF. Connect protocol
>> instances to such default AFs based on protocol types
>
> 1a) other changes in rte_recalculate() related to propagation
> (clean up the table before calling RA_ANY hook).
>
> 1) and 1a) i will do myself and send you the patch, and also make
> some trivial example for exporting to the same table.
>
> 2) i am not sure if there is a reason to put explicit AF info
> to struct fib, AF compatibility could be handled on higher level
> (struct rtable in general, other direct users probably use just
> one AF).
No problem, I misinterpreted "FIB / rtables would be uniform (AF_
bound)" as "FIB / rtable needs AF infor in structure fields"
>
> 3) and hashing callback (and perhaps fib_route, but not sure if this is
> needed).
>
> 4) probably encapsulate that to some static inline key_equal() function.
>
> 5) see my related note above. Protocol binding to tables should check AFs.
>
> more:
>
> 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail:
>
>>> i think encapsulation
>>> routes should be represented by routes with new destination type
>>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
>>> in new struct rta_mpls (or rta_nhlfe), which would be extension of
>>> struct rta (containing struct rta in the first field and NHLFE after
>>> that). Such structure could be easily passed as struct rta and functions
>>> from rt-attr.c can work with that, with jome some minor modifications
>>> (allocating, freeing and printing) dispatched based on dest field.
>
>>> This rta could be used without changes also for MPLS routes.
I'll try to send you patches for all these as I see it in several days.
>
>
>> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
>> can be used in case of bird used as RR in MPLS network, for example).
>> Should I supply patches for these? What are your plans about commit
>> routemap ?
>
> I create GIT branch 'mpls' and would merge these patches to that branch
> soon. When we will have some major release, we could merge 'mpls' branch
> to master if there is some sufficient usage (i think that even just
> static and kernel protocol support for MPLS would be a good example
> usage). Other protocols (LDP, ...) probably should be merged when they
> are reasonable ready.
Will this branch available from official git repo ? It is not accessible
(from its web interface at least).
Btw, some bird/LDP "status" report:
bird> show ldp neighbour
Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0
TCP connection: 10.2.33.4.11212 - 0.0.0.0.0
State: Operational; Msgs sent/rcvd: 21/61; Downstream
Up time: 00:02:27
LDP discovery sources:
em1, Src IP addr: 10.1.5.4
Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0
TCP connection: 10.2.33.3.11009 - 0.0.0.0.0
State: Operational; Msgs sent/rcvd: 29/60; Downstream
Up time: 00:02:20
LDP discovery sources:
em2, Src IP addr: 10.1.6.3
bird> show ldp bindings
lib entry: 10.2.0.0/30
local binding: label: 25
remote binding: lsr: 10.2.33.4:0, label: ImpNULL
remote binding: lsr: 10.2.33.3:0, label: 23
lib entry: 10.1.6.0/24
remote binding: lsr: 10.2.33.3:0, label: ImpNULL
remote binding: lsr: 10.2.33.4:0, label: 25
lib entry: 10.0.0.0/24
remote binding: lsr: 10.2.33.3:0, label: 19
remote binding: lsr: 10.2.33.4:0, label: 23
lib entry: 10.2.0.2/32
local binding: label: 26
remote binding: lsr: 10.2.33.4:0, label: 16
remote binding: lsr: 10.2.33.3:0, label: 24
lib entry: 10.1.4.0/24
local binding: label: 29
remote binding: lsr: 10.2.33.4:0, label: ImpNULL
remote binding: lsr: 10.2.33.3:0, label: ImpNULL
lib entry: 10.1.5.0/24
remote binding: lsr: 10.2.33.4:0, label: ImpNULL
remote binding: lsr: 10.2.33.3:0, label: ImpNULL
lib entry: 1.2.3.5/32
remote binding: lsr: 10.2.33.3:0, label: 20
remote binding: lsr: 10.2.33.4:0, label: 21
lib entry: 10.1.33.0/24
local binding: label: 28
remote binding: lsr: 10.2.33.4:0, label: ImpNULL
remote binding: lsr: 10.2.33.3:0, label: ImpNULL
lib entry: 10.2.33.3/32
local binding: label: 31
remote binding: lsr: 10.2.33.3:0, label: ImpNULL
lib entry: 10.2.33.4/32
local binding: label: 27
remote binding: lsr: 10.2.33.4:0, label: ImpNULL
remote binding: lsr: 10.2.33.3:0, label: 25
lib entry: 10.1.6.88/32
remote binding: lsr: 10.2.33.3:0, label: 18
remote binding: lsr: 10.2.33.4:0, label: 19
lib entry: 10.0.0.88/32
remote binding: lsr: 10.2.33.4:0, label: 17
remote binding: lsr: 10.2.33.3:0, label: 16
lib entry: 10.1.5.88/32
remote binding: lsr: 10.2.33.3:0, label: 21
remote binding: lsr: 10.2.33.4:0, label: 18
bird> show ldp forwardingtable
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or VC or Tunnel Id Switched interface
20 SWAP 10.2.0.0/30 0 ?
10.1.5.4
21 SWAP 10.2.0.2/32 0 ?
10.1.5.4
22 SWAP 10.2.33.4/32 0 ?
10.1.5.4
23 SWAP 10.1.33.0/24 0 ?
10.1.5.4
24 SWAP 10.1.4.0/24 0 ?
10.1.5.4
25 SWAP 10.2.0.0/30 0 ?
10.1.5.4
26 SWAP 10.2.0.2/32 0 ?
10.1.5.4
27 SWAP 10.2.33.4/32 0 ?
10.1.5.4
28 SWAP 10.1.33.0/24 0 ?
10.1.5.4
29 SWAP 10.1.4.0/24 0 ?
10.1.5.4
30 SWAP 10.2.33.3/32 0 ?
10.1.6.3
31 SWAP 10.2.33.3/32 0 ?
10.1.6.3
>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alexander V. Chernikov wrote:
> On 22.07.2011 14:52, Ondrej Zajicek wrote:
>> On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
>>>> Therefore there would be two types of routing tables - IP and MPLS. I
>>>> don't think it is a good idea to mix these. This may look inconsistent
>>>> with idea of embedding IPv4 to IPv6, but IP protocols are much more
>>>> similar, have a natural way to embed one in the other, have similar
>>>> roles and protocol structure. MPLS routing table could be used to LDP -
>>>> kernel interaction (routes imported from LDP and exported to kernel).
>>>> This solves your Case 2 without any hacks.
>>> So, from user point of view, I define
>>> table xxx; for both ipv4 and IPv6 routes and
>>> mpls table yyy; for MPLS routing table?
>>
>> Yes.
Patch permitting fibs to be used for any address family attached.
It should be considered as PoC patch for review. It works for my setup,
but I haven't tested it in production. netlink is not tested at all.
Some notes:
* fib has to have address type field (due to fib_get and other functions
using pointer to fib, not rtable)
* Due to address variable length we store it inside fib node this way:
|--------------------|
| struct fib_node |
| *addr --------\
|--------------------| |
| some user data | |
| | |
|--------------------| |
| address data <-------/
| |
|--------------------|
* Since we've got pointer to address data instead of data (ip_addr)
itself, all 9000 places with "%I/%d" needs to be changed, so more
general fib_print and fib2_print functions are implemented
* Several net_* calls were converted to fib_*
Btw, some IPv4/IPv6 merging questions/thoughts:
* show route will show complete mess for table with both v4 and v6
routes. Some sorting or 'afi ipv4|ipv6' has to be implemented.
* fill_in_sockaddr|get_sockaddr from io.c are somehow inconsequent:
fill_* uses OS-dependent set_inaddr to fill actual address data but
get_* uses direct calls to memcpy and ipa_ntoh instead of existing
OS-dependent get_inaddr. Moreover, set_ and get_ implementations are the
same for linux, bsd (and they should be the same for other UNIX-like
systems AFAIR, at least for IPv4/IPv6)
>>
>>> There should be base MPLS rtable (mpls_default, for example) as in IP.
>>> We can also add a hack for automatically subscribe protocols for MPLS
>>> routing table by type and other attributes. For example, every LDP
>>> instance gets connected to an MPLS table (default or defined in config).
>>> Kernel protocol instance gets connected to MPLS table only if its IP
>>> table is the default one (GRT) or 'mpls table' keyword is supplied
>>> explicitely. What about VPNv4/VPNv6 ? The same approach?
>>
>> Perhaps even default MPLS table should be explicitly configured [*]
>> (as i guess
>> not many BIRD users would use MPLS). Protocols requiring MPLS table would
>> fail if it is not configured, protocol with optional MPLS support
>> (kernel,
>> static?) just do not connect to MPLS in that case. The same approach
>> for VPNvX table.
>>
>> [*] probably like: mpls table XXX default;
> Maybe it's better to turn on "general" mpls support?
> e.g. 'mpls support;' or just 'mpls;' instead of propagating some table
> to be default?
>>
>>> Btw, how we will distinguish inet/inet6 rtes? (I'm talking about
>>> MP-BGP
>>> / IPv4-mapped cases)
>>
>> I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
>> similar purposes in IP stack. But this should not be checked directly
>> in protocols, there should be some macros in lib/ipv6.h for that.
>>
>>>> [*] when i wrote that i thought that labels are distributed just by LDP
>>>> and the purpose of label request is to propagate the label through LDP
>>>> area. i didn't noticed that BGP/MPLS also distributes labels so they
>>>> need to know assigned labels. So the idea would need some
>>>> modifications.
>>> Not sure this will work. Since t1 is an IP table cases when we need to
>>> request specific label for:
>>> * AToM
>>> * RSVP-TE tunnels
>>> will not work since there are no prefixes that can be mapped to such
>>> request.
>>
>> You are probably right. I originally thought about some specific
>> 'request table' (where requests coded as routes with specific AF),
>> but perhaps there should be used some other mechanism / other protocol
>> hook. But it should be generic enough (some bus, allows at least more
>> 'producers' and perhaps more 'consumers').
> Okay, i see this as follows:
> New rtable hook, service_hook, with uint32_3 bitmask specifying request
> classes we are responsible to:
> /* Defined classes */
> #define RCLASS_LABEL 0x01 /* MPLS label request */
>
> Some request function:
> int
> request_data(rtable *t, struct service_request *req, void **buf, size_t
> *bufsize)
>
> struct service_request {
> uint32_t request; /* Single request class set */
> uint32_t subclass; /* Subclass specific for request */
> proto *p; /* caller protocol */
> char data[0]; /* request-specific data follows */
> }
>
> function loops thru all registered hooks for given _class_ checking for
> reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to
> check subclass.
> #define SR_OK 0x01 /* Request successful */
> #define SR_FAIL 0x02 /* Request failed */
> #define SR_NEXT 0x03 /* Request skipped */
> #define SR_UNAVAIL 0x04 /* No providers for this request */
>
> As a result, caller get SR_UNAVAIL in case of no providers were able to
> serve request or SR_OK|SR_FAIL.
>
> caller can setup buffer itself and pass pointer to pointer to buffer and
> pointer to buffer size to function, or request provider to allocate data
> for him setting *buf to NULL and bufsize to 0
>
> struct service_reply { /* is returned in reply buffer */
> uint32_t request;
> uint32_t subclass;
> proto *p; /* protocol, providing data */
> char data[0]; /* request-specific data */
> }
>
>
>
>>
>>>> Internal LMAP table is examined, tracked IGP table is examined. If both
>>>> are ready (for given prefix), appropriate encapsulating and MPLS routes
>>>> are generated and propagated using rte_update(), otherwise nothing is
>>>> generated and the previously generated route is withdrawn (rte_update()
>>>> with NULL is called) (or perhaps an unreachable route is generated if
>>>> LMAP is here but IGP route is missing). Simple and elegant.
>>> .. and in case of label release we should remove label only and keep
>>> original route
>>
>> Yes.
>>
>>>> There are some tricky parts of IGP tracking - it is problematic
>>>> to use standard RA_OPTIMAL update for this purpose, because if
>>>> generated encapsulating routes are imported to the same table,
>>>> these probably became the optimal ones and IGP routes would be
>>>> shaded. Solution would be to use RA_ANY, and ignore notifications
>>>> containing encapsulating routes, similarly 'examining the tracked
>>>> IGP table' means looking up the fib node and find the best route,
>>>> ignoring encapsulating ones.
>>>>
>>>> For implementation of this behavior, there are two minor changes that
>>>> needs to be done to the rt table code: First, currently accept_ra_types
>>>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
>>>> property of an announce hook (as LDP would have two hooks with
>>>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
>>>> both in rte_recalculate should be moved after the route list
>>>> is updated/relinked.
>>
>>> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
>>> trivial task and requires internals understanding. Either announce type
>>> should be passed to announce hook or new hook should be added for RA_ANY
>>> event. The latter is more appropriate IMHO since RA_ANY is used by
>>> pipe
>>> protocol only.
>>
>> I thought about that when i created RA_ANY and have chosen this approach.
>> Probably best way is just to change rt_notify to have appropriate
>> struct announce_hook as a second argument instead of struct rtable.
>> struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
>> some protocol-specific data. As (probably) all protocols are in-tree,
>> doing some wide but trivial changes is not a problem.
>>
>>> Kernel protocol should track RA_ANY protocol hooks
>>> looking for update source (LDP / RSVP) and re-install appropriate
>>> routes.
>>
>> I think kernel protocol should use RA_OPTIMAL as usual. This kind
>> of RA_ANY usage is for protocols that export routes to the same
>> table they listen (so 'source' routes would be shaded by their
>> routes). These routes (LDP / RSVP) should have just highest
>> priority.
>>
>>> The only downside is situation when LDP signalling starts faster
>>> than IGP. In that case we will get 3 updates instead of one (at least in
>>> RTSOCK):
>>> * RTM_ADD for original prefix
>>> * RTM_DEL for this prefix (as part of krt_set_notify())
>>> * RTM_ADD for modified prefix
>>>
>>> RTM_CHANGE can be used in notify, but still: this gives 2 updates
>>> instead of one.
>>
>> No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
>> are propagated synchronously depth-first:
>>
>> OSPF --RA_ANY--> LDP
>> LDP --RA_OPTIMAL--> kernel
>> OSPF --RA_OPTIMAL--> kernel
>>
> Still I can't understand how exactly I can modify an announced IP route
> (still, from FreeBSD kernel point of view encapsulated route is a usual
> route with an attribute attached. From Linux point of view this should
> be more or less the same since an IP route lookup have to be done for
> incoming packet anyway and doing several different lookups is not a best
> idea). I've got RA_ANY hook called for a new route (and I should know
> that it is actually RA_OPTIMAL without some complex logic!), what I
> should do next ?
>
>> But it is true that this is much dependent on internal implementation
>> of route propagation. The first idea i had was to use separate
>> tables for original and labeled routes (when just RA_OPTIMAL hooks),
>> but that looks too cumbersome for users and ability to push a better
>> route to the same (input) table has other possible usages.
>>
>>>> Therefore, it is probably a good idea to extend FIBs in a way you
>>>> suggested, with minor details changed. FIB / rtables would be uniform
>>>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and
>>>> IPv6
>>>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
>>>> minimize code changes, struct fib_node would have ip_addr prefix, but
>>>> might be allocated larger.
>>> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
>>> enough for holding IPv6 address? This can bump memory consumption for
>>> setups with several full-views significantly.
>>
>> It increases memory consumtion, but not so much in a relative view - for
>> each struct network there is at least one struct rte and in both of them
>> there is just one ip_addr and both structures are nontrivial. So this
>> relative increase would be about 1.15-1.2. Really big users would
>> probably keep current splitted setting.
> Okay, it's much easier from developer point of view. If you're not
> afraid of your users :)
>>
>>>> Because each protocol and each its announce_hook have appropriate role,
>>>> it is IMHO unnecessary to have AF in protocol hooks, but there
>>>> should be
>>>> check whether protocol/announce_hook is connected to appropriate
>>>> rtable.
>>>>
>>>
>>> To summarize required changes (please correct me):
>>> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
>>> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
>>> * rtable
>>> * fib
>>> * rte
>>> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
>>> to struct fib to hold this value.
>>> 4) Move to memcmp() in fib_find / fib_get
>>> 5) Set up default rtable for every supported AF. Connect protocol
>>> instances to such default AFs based on protocol types
>>
>> 1a) other changes in rte_recalculate() related to propagation
>> (clean up the table before calling RA_ANY hook).
>>
>> 1) and 1a) i will do myself and send you the patch, and also make
>> some trivial example for exporting to the same table.
>>
>> 2) i am not sure if there is a reason to put explicit AF info
>> to struct fib, AF compatibility could be handled on higher level
>> (struct rtable in general, other direct users probably use just
>> one AF).
> No problem, I misinterpreted "FIB / rtables would be uniform (AF_
> bound)" as "FIB / rtable needs AF infor in structure fields"
>>
>> 3) and hashing callback (and perhaps fib_route, but not sure if this is
>> needed).
>>
>> 4) probably encapsulate that to some static inline key_equal() function.
>>
>> 5) see my related note above. Protocol binding to tables should check
>> AFs.
>>
>> more:
>>
>> 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous
>> mail:
>>
>>>> i think encapsulation
>>>> routes should be represented by routes with new destination type
>>>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
>>>> in new struct rta_mpls (or rta_nhlfe), which would be extension of
>>>> struct rta (containing struct rta in the first field and NHLFE after
>>>> that). Such structure could be easily passed as struct rta and
>>>> functions
>>>> from rt-attr.c can work with that, with jome some minor modifications
>>>> (allocating, freeing and printing) dispatched based on dest field.
>>
>>>> This rta could be used without changes also for MPLS routes.
>
> I'll try to send you patches for all these as I see it in several days.
>>
>>
>>> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
>>> can be used in case of bird used as RR in MPLS network, for example).
>>> Should I supply patches for these? What are your plans about commit
>>> routemap ?
>>
>> I create GIT branch 'mpls' and would merge these patches to that branch
>> soon. When we will have some major release, we could merge 'mpls' branch
>> to master if there is some sufficient usage (i think that even just
>> static and kernel protocol support for MPLS would be a good example
>> usage). Other protocols (LDP, ...) probably should be merged when they
>> are reasonable ready.
> Will this branch available from official git repo ? It is not accessible
> (from its web interface at least).
>
>
> Btw, some bird/LDP "status" report:
>
> bird> show ldp neighbour
> Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0
> TCP connection: 10.2.33.4.11212 - 0.0.0.0.0
> State: Operational; Msgs sent/rcvd: 21/61; Downstream
> Up time: 00:02:27
> LDP discovery sources:
> em1, Src IP addr: 10.1.5.4
> Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0
> TCP connection: 10.2.33.3.11009 - 0.0.0.0.0
> State: Operational; Msgs sent/rcvd: 29/60; Downstream
> Up time: 00:02:20
> LDP discovery sources:
> em2, Src IP addr: 10.1.6.3
> bird> show ldp bindings
> lib entry: 10.2.0.0/30
> local binding: label: 25
> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
> remote binding: lsr: 10.2.33.3:0, label: 23
> lib entry: 10.1.6.0/24
> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
> remote binding: lsr: 10.2.33.4:0, label: 25
> lib entry: 10.0.0.0/24
> remote binding: lsr: 10.2.33.3:0, label: 19
> remote binding: lsr: 10.2.33.4:0, label: 23
> lib entry: 10.2.0.2/32
> local binding: label: 26
> remote binding: lsr: 10.2.33.4:0, label: 16
> remote binding: lsr: 10.2.33.3:0, label: 24
> lib entry: 10.1.4.0/24
> local binding: label: 29
> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
> lib entry: 10.1.5.0/24
> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
> lib entry: 1.2.3.5/32
> remote binding: lsr: 10.2.33.3:0, label: 20
> remote binding: lsr: 10.2.33.4:0, label: 21
> lib entry: 10.1.33.0/24
> local binding: label: 28
> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
> lib entry: 10.2.33.3/32
> local binding: label: 31
> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
> lib entry: 10.2.33.4/32
> local binding: label: 27
> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
> remote binding: lsr: 10.2.33.3:0, label: 25
> lib entry: 10.1.6.88/32
> remote binding: lsr: 10.2.33.3:0, label: 18
> remote binding: lsr: 10.2.33.4:0, label: 19
> lib entry: 10.0.0.88/32
> remote binding: lsr: 10.2.33.4:0, label: 17
> remote binding: lsr: 10.2.33.3:0, label: 16
> lib entry: 10.1.5.88/32
> remote binding: lsr: 10.2.33.3:0, label: 21
> remote binding: lsr: 10.2.33.4:0, label: 18
> bird> show ldp forwardingtable
> Local Outgoing Prefix Bytes Label Outgoing Next Hop
> Label Label or VC or Tunnel Id Switched interface
> 20 SWAP 10.2.0.0/30 0 ? 10.1.5.4
> 21 SWAP 10.2.0.2/32 0 ? 10.1.5.4
> 22 SWAP 10.2.33.4/32 0 ? 10.1.5.4
> 23 SWAP 10.1.33.0/24 0 ? 10.1.5.4
> 24 SWAP 10.1.4.0/24 0 ? 10.1.5.4
> 25 SWAP 10.2.0.0/30 0 ? 10.1.5.4
> 26 SWAP 10.2.0.2/32 0 ? 10.1.5.4
> 27 SWAP 10.2.33.4/32 0 ? 10.1.5.4
> 28 SWAP 10.1.33.0/24 0 ? 10.1.5.4
> 29 SWAP 10.1.4.0/24 0 ? 10.1.5.4
> 30 SWAP 10.2.33.3/32 0 ? 10.1.6.3
> 31 SWAP 10.2.33.3/32 0 ? 10.1.6.3
>
>
>>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk41FJEACgkQwcJ4iSZ1q2kZNwCfZHk19PuXn2esNZ/KrvXOir5v
zTMAoKe78CsexI0pPJ4li50e8teBCcpa
=yqPo
-----END PGP SIGNATURE-----
Index: filter/filter.c
===================================================================
--- filter/filter.c (revision 4962)
+++ filter/filter.c (working copy)
@@ -679,9 +679,9 @@ interpret(struct f_inst *what)
case T_STRING: /* Warning: this is a special case for proto attribute */
res.val.s = rta->proto->name;
break;
- case T_PREFIX: /* Warning: this works only for prefix of network */
+ case T_PREFIX: /* Warning: this works only for RT_IP prefix of network */
{
- res.val.px.ip = (*f_rte)->net->n.prefix;
+ res.val.px.ip = *FPREFIX_IP(&(*f_rte)->net->n);
res.val.px.len = (*f_rte)->net->n.pxlen;
break;
}
Index: proto/ospf/ospf.c
===================================================================
--- proto/ospf/ospf.c (revision 4962)
+++ proto/ospf/ospf.c (working copy)
@@ -812,7 +812,7 @@ ospf_sh(struct proto *p)
cli_msg(-1014, "\t\tArea networks:");
firstfib = 0;
}
- cli_msg(-1014, "\t\t\t%1I/%u\t%s\t%s", anet->fn.prefix, anet->fn.pxlen,
+ cli_msg(-1014, "\t\t\t%1I/%u\t%s\t%s", *FPREFIX_IP(&anet->fn), anet->fn.pxlen,
anet->hidden ? "Hidden" : "Advertise", anet->active ? "Active" : "");
}
FIB_WALK_END;
Index: proto/ospf/topology.c
===================================================================
--- proto/ospf/topology.c (revision 4962)
+++ proto/ospf/topology.c (working copy)
@@ -65,7 +65,7 @@ fibnode_to_lsaid(struct proto_ospf *po, struct fib
LSA ID for a network because different network appeared, we
choose a different way. */
- u32 id = _I(fn->prefix);
+ u32 id = _I(*FPREFIX_IP(fn));
if ((po->rfc1583) || (fn->pxlen == 0) || (fn->pxlen == 32))
return id;
@@ -764,8 +764,8 @@ originate_sum_net_lsa(struct ospf_area *oa, struct
struct ospf_lsa_header lsa;
void *body;
- OSPF_TRACE(D_EVENTS, "Originating net-summary-LSA for %I/%d (metric %d)",
- fn->prefix, fn->pxlen, metric);
+ OSPF_TRACE(D_EVENTS, "Originating net-summary-LSA for %s (metric %d)",
+ fib_print(fn), metric);
/* options argument is used in ORT_NET and OSPFv3 only */
lsa.age = 0;
@@ -780,8 +780,7 @@ originate_sum_net_lsa(struct ospf_area *oa, struct
{
if (check_sum_net_lsaid_collision(fn, en))
{
- log(L_ERR, "%s: LSAID collision for %I/%d",
- p->name, fn->prefix, fn->pxlen);
+ log(L_ERR, "%s: LSAID collision for %s", p->name, fib_print(fn));
return;
}
@@ -803,7 +802,7 @@ originate_sum_rt_lsa(struct ospf_area *oa, struct
struct proto *p = &po->proto;
struct top_hash_entry *en;
u32 dom = oa->areaid;
- u32 rid = ipa_to_rid(fn->prefix);
+ u32 rid = ipa_to_rid(*FPREFIX_IP(fn));
struct ospf_lsa_header lsa;
void *body;
@@ -850,7 +849,7 @@ flush_sum_lsa(struct ospf_area *oa, struct fib_nod
else
{
/* In OSPFv3, LSA ID is meaningless, but we still use Router ID of ASBR */
- lsa.id = ipa_to_rid(fn->prefix);
+ lsa.id = ipa_to_rid(*FPREFIX_IP(fn));
lsa.type = LSA_T_SUM_RT;
}
@@ -859,7 +858,7 @@ flush_sum_lsa(struct ospf_area *oa, struct fib_nod
if ((type == ORT_NET) && check_sum_net_lsaid_collision(fn, en))
{
log(L_ERR, "%s: LSAID collision for %I/%d",
- p->name, fn->prefix, fn->pxlen);
+ p->name, fib_print(fn));
return;
}
@@ -1011,8 +1010,7 @@ originate_ext_lsa(net * n, rte * e, struct proto_o
void *body;
struct ospf_area *oa;
- OSPF_TRACE(D_EVENTS, "Originating AS-external-LSA for %I/%d",
- fn->prefix, fn->pxlen);
+ OSPF_TRACE(D_EVENTS, "Originating AS-external-LSA for %s", fib_print(fn));
lsa.age = 0;
#ifdef OSPFv2
@@ -1040,8 +1038,7 @@ originate_ext_lsa(net * n, rte * e, struct proto_o
int rv = check_ext_lsa(en, fn, metric, gw, tag);
if (rv < 0)
{
- log(L_ERR, "%s: LSAID collision for %I/%d",
- p->name, fn->prefix, fn->pxlen);
+ log(L_ERR, "%s: LSAID collision for %s", p->name, fib_print(fn));
return;
}
@@ -1073,8 +1070,7 @@ flush_ext_lsa(net *n, struct proto_ospf *po)
struct fib_node *fn = &n->n;
struct top_hash_entry *en;
- OSPF_TRACE(D_EVENTS, "Flushing AS-external-LSA for %I/%d",
- fn->prefix, fn->pxlen);
+ OSPF_TRACE(D_EVENTS, "Flushing AS-external-LSA for %s", fib_print(fn));
u32 lsaid = fibnode_to_lsaid(po, fn);
@@ -1082,8 +1078,7 @@ flush_ext_lsa(net *n, struct proto_ospf *po)
{
if (check_ext_lsa(en, fn, 0, IPA_NONE, 0) < 0)
{
- log(L_ERR, "%s: LSAID collision for %I/%d",
- p->name, fn->prefix, fn->pxlen);
+ log(L_ERR, "%s: LSAID collision for %s", p->name, fib_print(fn));
return;
}
Index: proto/ospf/rt.c
===================================================================
--- proto/ospf/rt.c (revision 4962)
+++ proto/ospf/rt.c (working copy)
@@ -889,7 +889,7 @@ decide_sum_lsa(struct ospf_area *oa, ort *nf, int
return 1;
struct area_net *anet = (struct area_net *)
- fib_route(&nf->n.oa->net_fib, nf->fn.prefix, nf->fn.pxlen);
+ fib_route(&nf->n.oa->net_fib, FPREFIX_IP(&nf->fn), nf->fn.pxlen);
/* Condensed area network found */
if (anet)
@@ -915,7 +915,7 @@ check_sum_net_lsa(struct proto_ospf *po, ort *nf)
/* Find that area network */
WALK_LIST(anet_oa, po->area_list)
{
- anet = (struct area_net *) fib_find(&anet_oa->net_fib, &nf->fn.prefix, nf->fn.pxlen);
+ anet = (struct area_net *) fib_find(&anet_oa->net_fib, FPREFIX(&nf->fn), nf->fn.pxlen);
if (anet)
break;
}
@@ -1017,7 +1017,7 @@ ospf_rt_abr(struct proto_ospf *po)
/* Compute condensed area networks */
if (nf->n.type == RTS_OSPF)
{
- anet = (struct area_net *) fib_route(&nf->n.oa->net_fib, nf->fn.prefix, nf->fn.pxlen);
+ anet = (struct area_net *) fib_route(&nf->n.oa->net_fib, FPREFIX_IP(&nf->fn), nf->fn.pxlen);
if (anet)
{
if (!anet->active)
@@ -1025,7 +1025,7 @@ ospf_rt_abr(struct proto_ospf *po)
anet->active = 1;
/* Get a RT entry and mark it to know that it is an area network */
- ort *nfi = (ort *) fib_get(&po->rtf, &anet->fn.prefix, anet->fn.pxlen);
+ ort *nfi = (ort *) fib_get(&po->rtf, FPREFIX(&anet->fn), anet->fn.pxlen);
nfi->fn.x0 = 1; /* mark and keep persistent, to have stable UID */
/* 16.2. (3) */
@@ -1060,7 +1060,7 @@ ospf_rt_abr(struct proto_ospf *po)
{
nf = (ort *) nftmp;
if (nf->n.options & ORTA_ASBR)
- ri_install_asbr(po, &nf->fn.prefix, &nf->n);
+ ri_install_asbr(po, FPREFIX_IP(&nf->fn), &nf->n);
}
FIB_WALK_END;
}
@@ -1714,7 +1714,7 @@ again1:
if (reload || ort_changed(nf, &a0))
{
- net *ne = net_get(p->table, nf->fn.prefix, nf->fn.pxlen);
+ net *ne = fib_get(&p->table->fib, FPREFIX(&nf->fn), nf->fn.pxlen);
rta *a = rta_lookup(&a0);
rte *e = rte_get_temp(a);
@@ -1739,7 +1739,7 @@ again1:
rta_free(nf->old_rta);
nf->old_rta = NULL;
- net *ne = net_get(p->table, nf->fn.prefix, nf->fn.pxlen);
+ net *ne = fib_get(&p->table->fib, FPREFIX(&nf->fn), nf->fn.pxlen);
rte_update(p->table, ne, p, p, NULL);
}
Index: proto/bgp/packets.c
===================================================================
--- proto/bgp/packets.c (revision 4962)
+++ proto/bgp/packets.c (working copy)
@@ -222,10 +222,10 @@ bgp_encode_prefixes(struct bgp_proto *p, byte *w,
while (!EMPTY_LIST(buck->prefixes) && remains >= (1+sizeof(ip_addr)))
{
struct bgp_prefix *px = SKIP_BACK(struct bgp_prefix, bucket_node, HEAD(buck->prefixes));
- DBG("\tDequeued route %I/%d\n", px->n.prefix, px->n.pxlen);
+ DBG("\tDequeued route %s\n", fib_print(&px->n));
*w++ = px->n.pxlen;
bytes = (px->n.pxlen + 7) / 8;
- a = px->n.prefix;
+ a = *FPREFIX_IP(&px->n);
ipa_hton(a);
memcpy(w, &a, bytes);
w += bytes;
@@ -242,7 +242,7 @@ bgp_flush_prefixes(struct bgp_proto *p, struct bgp
while (!EMPTY_LIST(buck->prefixes))
{
struct bgp_prefix *px = SKIP_BACK(struct bgp_prefix, bucket_node, HEAD(buck->prefixes));
- log(L_ERR "%s: - route %I/%d skipped", p->p.name, px->n.prefix, px->n.pxlen);
+ log(L_ERR "%s: - route %s skipped", p->p.name, fib_print(&px->n));
rem_node(&px->bucket_node);
fib_delete(&p->prefix_fib, px);
}
Index: proto/bgp/attrs.c
===================================================================
--- proto/bgp/attrs.c (revision 4962)
+++ proto/bgp/attrs.c (working copy)
@@ -786,7 +786,8 @@ bgp_get_bucket(struct bgp_proto *p, net *n, ea_lis
for(i=0; i<ARRAY_SIZE(bgp_mandatory_attrs); i++)
if (!(seen & (1 << bgp_mandatory_attrs[i])))
{
- log(L_ERR "%s: Mandatory attribute %s missing in route %I/%d", p->p.name, bgp_attr_table[bgp_mandatory_attrs[i]].name, n->n.prefix, n->n.pxlen);
+ log(L_ERR "%s: Mandatory attribute %s missing in route %s", p->p.name,
+ bgp_attr_table[bgp_mandatory_attrs[i]].name, fib2_print(PROTO_FIB(&p->p), &n->n));
return NULL;
}
@@ -794,7 +795,7 @@ bgp_get_bucket(struct bgp_proto *p, net *n, ea_lis
a = ea_find(new, EA_CODE(EAP_BGP, BA_NEXT_HOP));
if (!a || ipa_equal(p->cf->remote_ip, *(ip_addr *)a->u.ptr->data))
{
- log(L_ERR "%s: Invalid NEXT_HOP attribute in route %I/%d", p->p.name, n->n.prefix, n->n.pxlen);
+ log(L_ERR "%s: Invalid NEXT_HOP attribute in route %s", p->p.name, fib2_print(PROTO_FIB(&p->p), &n->n));
return NULL;
}
@@ -838,7 +839,7 @@ bgp_rt_notify(struct proto *P, rtable *tbl UNUSED,
init_list(&buck->prefixes);
}
}
- px = fib_get(&p->prefix_fib, &n->n.prefix, n->n.pxlen);
+ px = fib_get(&p->prefix_fib, FPREFIX(&n->n), n->n.pxlen);
if (px->bucket_node.next)
{
DBG("\tRemoving old entry.\n");
Index: proto/rip/rip.c
===================================================================
--- proto/rip/rip.c (revision 4962)
+++ proto/rip/rip.c (working copy)
@@ -97,7 +97,7 @@ rip_tx_prepare(struct proto *p, struct rip_block *
int metric;
DBG( "." );
b->tag = htons( e->tag );
- b->network = e->n.prefix;
+ b->network = *FPREFIX_IP(&e->n);
metric = e->metric;
if (neigh_connected_to(p, &e->whotoldme, rif->iface)) {
DBG( "(split horizon)" );
@@ -498,8 +498,8 @@ rip_rx(sock *s, int size)
static void
rip_dump_entry( struct rip_entry *e )
{
- debug( "%I told me %d/%d ago: to %I/%d go via %I, metric %d ",
- e->whotoldme, e->updated-now, e->changed-now, e->n.prefix, e->n.pxlen, e->nexthop, e->metric );
+ debug( "%I told me %d/%d ago: to %s go via %I, metric %d ",
+ e->whotoldme, e->updated-now, e->changed-now, fib_print(&e->n), e->nexthop, e->metric );
debug( "\n" );
}
@@ -535,7 +535,7 @@ rip_timer(timer *t)
#endif
if (now - rte->lastmod > P_CF->timeout_time) {
- TRACE(D_EVENTS, "entry is too old: %I", rte->net->n.prefix );
+ TRACE(D_EVENTS, "entry is too old: %I", *FPREFIX_IP(&rte->net->n) );
if (rte->u.rip.entry) {
rte->u.rip.entry->metric = P_CF->infinity;
rte->u.rip.metric = P_CF->infinity;
@@ -543,7 +543,7 @@ rip_timer(timer *t)
}
if (now - rte->lastmod > P_CF->garbage_time) {
- TRACE(D_EVENTS, "entry is much too old: %I", rte->net->n.prefix );
+ TRACE(D_EVENTS, "entry is much too old: %I", *FPREFIX_IP(&rte->net->n) );
rte_discard(p->table, rte);
}
}
@@ -873,12 +873,12 @@ rip_rt_notify(struct proto *p, struct rtable *tabl
CHK_MAGIC;
struct rip_entry *e;
- e = fib_find( &P->rtable, &net->n.prefix, net->n.pxlen );
+ e = fib_find( &P->rtable, FPREFIX(&net->n), net->n.pxlen );
if (e)
fib_delete( &P->rtable, e );
if (new) {
- e = fib_get( &P->rtable, &net->n.prefix, net->n.pxlen );
+ e = fib_get( &P->rtable, FPREFIX(&net->n), net->n.pxlen );
e->nexthop = new->attrs->gw;
e->metric = 0;
Index: proto/pipe/pipe.c
===================================================================
--- proto/pipe/pipe.c (revision 4962)
+++ proto/pipe/pipe.c (working copy)
@@ -46,11 +46,11 @@ pipe_rt_notify(struct proto *P, rtable *src_table,
if (dest->pipe_busy)
{
- log(L_ERR "Pipe loop detected when sending %I/%d to table %s",
- n->n.prefix, n->n.pxlen, dest->name);
+ log(L_ERR "Pipe loop detected when sending %s to table %s",
+ fib_print(&n->n), dest->name);
return;
}
- nn = net_get(dest, n->n.prefix, n->n.pxlen);
+ nn = fib_get(&dest->fib, FPREFIX(&n->n), n->n.pxlen);
if (new)
{
memcpy(&a, new->attrs, sizeof(rta));
Index: sysdep/linux/krt-scan.c
===================================================================
--- sysdep/linux/krt-scan.c (revision 4962)
+++ sysdep/linux/krt-scan.c (working copy)
@@ -101,7 +101,7 @@ krt_parse_entry(byte *ent, struct krt_proto *p)
a.iface = ng->iface;
else
{
- log(L_WARN "Kernel told us to use non-neighbor %I for %I/%d", gw, net->n.prefix, net->n.pxlen);
+ log(L_WARN "Kernel told us to use non-neighbor %I for %s", gw, fib_print(&net->n));
return;
}
a.dest = RTD_ROUTER;
@@ -120,7 +120,7 @@ krt_parse_entry(byte *ent, struct krt_proto *p)
}
else
{
- log(L_WARN "Kernel reporting unknown route type to %I/%d", net->n.prefix, net->n.pxlen);
+ log(L_WARN "Kernel reporting unknown route type to %s", fib_print(&net->n));
return;
}
Index: sysdep/linux/netlink/netlink.c
===================================================================
--- sysdep/linux/netlink/netlink.c (revision 4962)
+++ sysdep/linux/netlink/netlink.c (working copy)
@@ -628,7 +628,7 @@ nl_send_route(struct krt_proto *p, rte *e, int new
char buf[64 + nh_bufsize(a->nexthops)];
} r;
- DBG("nl_send_route(%I/%d,new=%d)\n", net->n.prefix, net->n.pxlen, new);
+ DBG("nl_send_route(%s,new=%d)\n", fib2_print(e->rtype, &net->n), new);
bzero(&r.h, sizeof(r.h));
bzero(&r.r, sizeof(r.r));
@@ -642,7 +642,7 @@ nl_send_route(struct krt_proto *p, rte *e, int new
r.r.rtm_table = KRT_CF->scan.table_id;
r.r.rtm_protocol = RTPROT_BIRD;
r.r.rtm_scope = RT_SCOPE_UNIVERSE;
- nl_add_attr_ipa(&r.h, sizeof(r), RTA_DST, net->n.prefix);
+ nl_add_attr_ipa(&r.h, sizeof(r), RTA_DST, *FPREFIX_IP(&net->n));
if (ea = ea_find(a->eattrs, EA_KRT_PREFSRC))
nl_add_attr_ipa(&r.h, sizeof(r), RTA_PREFSRC, *(ip_addr *)ea->u.ptr->data);
@@ -807,8 +807,7 @@ nl_parse_route(struct nlmsghdr *h, int scan)
ra.nexthops = nl_parse_multipath(p, a[RTA_MULTIPATH]);
if (!ra.nexthops)
{
- log(L_ERR "KRT: Received strange multipath route %I/%d",
- net->n.prefix, net->n.pxlen);
+ log(L_ERR "KRT: Received strange multipath route %s", fib_print(&net->n));
return;
}
@@ -818,8 +817,8 @@ nl_parse_route(struct nlmsghdr *h, int scan)
ra.iface = if_find_by_index(oif);
if (!ra.iface)
{
- log(L_ERR "KRT: Received route %I/%d with unknown ifindex %u",
- net->n.prefix, net->n.pxlen, oif);
+ log(L_ERR "KRT: Received route %s with unknown ifindex %u",
+ fib_print(&net->n), oif);
return;
}
@@ -838,8 +837,8 @@ nl_parse_route(struct nlmsghdr *h, int scan)
(i->rtm_flags & RTNH_F_ONLINK) ? NEF_ONLINK : 0);
if (!ng || (ng->scope == SCOPE_HOST))
{
- log(L_ERR "KRT: Received route %I/%d with strange next-hop %I",
- net->n.prefix, net->n.pxlen, ra.gw);
+ log(L_ERR "KRT: Received route %s with strange next-hop %I",
+ fib_print(&net->n), ra.gw);
return;
}
}
Index: sysdep/unix/krt.c
===================================================================
--- sysdep/unix/krt.c (revision 4962)
+++ sysdep/unix/krt.c (working copy)
@@ -234,14 +234,14 @@ static inline void
krt_trace_in(struct krt_proto *p, rte *e, char *msg)
{
if (p->p.debug & D_PACKETS)
- log(L_TRACE "%s: %I/%d: %s", p->p.name, e->net->n.prefix, e->net->n.pxlen, msg);
+ log(L_TRACE "%s: %s: %s", p->p.name, fib2_print(PROTO_FIB(&p->p), &e->net->n), msg);
}
static inline void
krt_trace_in_rl(struct rate_limit *rl, struct krt_proto *p, rte *e, char *msg)
{
if (p->p.debug & D_PACKETS)
- log_rl(rl, L_TRACE "%s: %I/%d: %s", p->p.name, e->net->n.prefix, e->net->n.pxlen, msg);
+ log_rl(rl, L_TRACE "%s: %s: %s", p->p.name, fib2_print(PROTO_FIB(&p->p), &e->net->n), msg);
}
/*
@@ -266,7 +266,7 @@ krt_learn_announce_update(struct krt_proto *p, rte
net *n = e->net;
rta *aa = rta_clone(e->attrs);
rte *ee = rte_get_temp(aa);
- net *nn = net_get(p->p.table, n->n.prefix, n->n.pxlen);
+ net *nn = fib_get(&p->p.table->fib, FPREFIX(&n->n), n->n.pxlen);
ee->net = nn;
ee->pflags = 0;
ee->pref = p->p.preference;
@@ -277,7 +277,7 @@ krt_learn_announce_update(struct krt_proto *p, rte
static void
krt_learn_announce_delete(struct krt_proto *p, net *n)
{
- n = net_find(p->p.table, n->n.prefix, n->n.pxlen);
+ n = fib_find(&p->p.table->fib, FPREFIX(&n->n), n->n.pxlen);
if (n)
rte_update(p->p.table, n, &p->p, &p->p, NULL);
}
@@ -286,7 +286,7 @@ static void
krt_learn_scan(struct krt_proto *p, rte *e)
{
net *n0 = e->net;
- net *n = net_get(&p->krt_table, n0->n.prefix, n0->n.pxlen);
+ net *n = fib_get(&p->krt_table.fib, FPREFIX(&n0->n), n0->n.pxlen);
rte *m, **mm;
e->attrs->source = RTS_INHERIT;
@@ -358,7 +358,7 @@ again:
}
if (!n->routes)
{
- DBG("%I/%d: deleting\n", n->n.prefix, n->n.pxlen);
+ DBG("%s: deleting\n", fib2_print(fib, n));
if (old_best)
{
krt_learn_announce_delete(p, n);
@@ -387,8 +387,8 @@ static void
krt_learn_async(struct krt_proto *p, rte *e, int new)
{
net *n0 = e->net;
- net *n = net_get(&p->krt_table, n0->n.prefix, n0->n.pxlen);
rte *g, **gg, *best, **bestp, *old_best;
+ net *n = fib_get(&p->krt_table.fib, FPREFIX(&n0->n), n0->n.pxlen);
e->attrs->source = RTS_INHERIT;
Index: sysdep/bsd/krt-sock.c
===================================================================
--- sysdep/bsd/krt-sock.c (revision 4962)
+++ sysdep/bsd/krt-sock.c (working copy)
@@ -81,7 +81,7 @@ krt_sock_send(int cmd, rte *e)
sockaddr gate, mask, dst;
ip_addr gw;
- DBG("krt-sock: send %I/%d via %I\n", net->n.prefix, net->n.pxlen, a->gw);
+ DBG("krt-sock: send %s via %I\n", fib2_print(e->rtype, &net->n), a->gw);
bzero(&msg,sizeof (struct rt_msghdr));
msg.rtm.rtm_version = RTM_VERSION;
@@ -134,7 +134,8 @@ krt_sock_send(int cmd, rte *e)
_I0(gw) = 0xfe800000 | (i->index & 0x0000ffff);
#endif
- fill_in_sockaddr(&dst, net->n.prefix, 0);
+ /* XXX: more general approach should be used here */
+ fill_in_sockaddr(&dst, *FPREFIX_IP(&net->n), 0);
fill_in_sockaddr(&mask, ipa_mkmask(net->n.pxlen), 0);
fill_in_sockaddr(&gate, gw, 0);
@@ -181,7 +182,7 @@ krt_sock_send(int cmd, rte *e)
msg.rtm.rtm_msglen = l;
if ((l = write(rt_sock, (char *)&msg, l)) < 0) {
- log(L_ERR "KRT: Error sending route %I/%d to kernel", net->n.prefix, net->n.pxlen);
+ log(L_ERR "KRT: Error sending route %s to kernel", fib2_print(e->rtype, &net->n));
}
}
@@ -190,12 +191,12 @@ krt_set_notify(struct krt_proto *p UNUSED, net *ne
{
if (old)
{
- DBG("krt_remove_route(%I/%d)\n", net->n.prefix, net->n.pxlen);
+ DBG("krt_remove_route(%s)\n", fib2_print(PROTO_FIB(&p->p), &net->n));
krt_sock_send(RTM_DELETE, old);
}
if (new)
{
- DBG("krt_add_route(%I/%d)\n", net->n.prefix, net->n.pxlen);
+ DBG("krt_add_route(%s)\n", fib2_print(PROTO_FIB(&p->p), &net->n));
krt_sock_send(RTM_ADD, new);
}
}
@@ -355,8 +356,8 @@ krt_read_rt(struct ks_msg *msg, struct krt_proto *
a.iface = if_find_by_index(msg->rtm.rtm_index);
if (!a.iface)
{
- log(L_ERR "KRT: Received route %I/%d with unknown ifindex %u",
- net->n.prefix, net->n.pxlen, msg->rtm.rtm_index);
+ log(L_ERR "KRT: Received route %s with unknown ifindex %u",
+ fib2_print(PROTO_FIB(&p->p), &net->n), msg->rtm.rtm_index);
return;
}
@@ -380,8 +381,8 @@ krt_read_rt(struct ks_msg *msg, struct krt_proto *
if (ipa_classify(a.gw) == (IADDR_HOST | SCOPE_HOST))
return;
- log(L_ERR "KRT: Received route %I/%d with strange next-hop %I",
- net->n.prefix, net->n.pxlen, a.gw);
+ log(L_ERR "KRT: Received route %s with strange next-hop %I",
+ fib2_print(PROTO_FIB(&p->p), &net->n), a.gw);
return;
}
}
Index: nest/route.h
===================================================================
--- nest/route.h (revision 4962)
+++ nest/route.h (working copy)
@@ -38,7 +38,7 @@ struct fib_node {
byte flags; /* User-defined */
byte x0, x1; /* User-defined */
u32 uid; /* Unique ID based on hash */
- ip_addr prefix; /* In host order */
+ void *addr; /* Pointer to (already allocated) address data. Host order required */
};
struct fib_iterator { /* See lib/slists.h for an explanation */
@@ -50,6 +50,7 @@ struct fib_iterator { /* See lib/slists.h for an
};
typedef void (*fib_init_func)(struct fib_node *);
+typedef int (*fib_hash_func)(void *);
struct fib {
pool *fib_pool; /* Pool holding all our data */
@@ -58,15 +59,24 @@ struct fib {
unsigned int hash_size; /* Number of hash table entries (a power of two) */
unsigned int hash_order; /* Binary logarithm of hash_size */
unsigned int hash_shift; /* 16 - hash_log */
+ unsigned int addr_type; /* Type of addresses stored in fib */
+ unsigned int addr_size; /* size of address specified in entry */
+ unsigned int node_size; /* size of node to allocate */
unsigned int entries; /* Number of entries */
unsigned int entries_min, entries_max;/* Entry count limits (else start rehashing) */
fib_init_func init; /* Constructor */
+ fib_hash_func hash_f; /* Optional hash function */
};
void fib_init(struct fib *, pool *, unsigned node_size, unsigned hash_order, fib_init_func init);
-void *fib_find(struct fib *, ip_addr *, int); /* Find or return NULL if doesn't exist */
-void *fib_get(struct fib *, ip_addr *, int); /* Find or create new if nonexistent */
-void *fib_route(struct fib *, ip_addr, int); /* Longest-match routing lookup */
+//#define fib_init(f, p, node_size, hash_order, init) fib2_init(f, p, node_size, RT_IP, sizeof(ip_addr), hash_order, init, NULL)
+void fib2_init(struct fib *, pool *, unsigned node_size, unsigned int addr_type, unsigned int addr_size, \
+ unsigned hash_order, fib_init_func init, fib_hash_func hash_f);
+void *fib_find(struct fib *, void *, int); /* Find or return NULL if doesn't exist */
+void *fib_get(struct fib *, void *, int); /* Find or create new if nonexistent */
+void *fib_route(struct fib *, ip_addr *, int); /* Longest-match routing lookup */
+char *fib_print(struct fib_node *); /* Prints human-readable fib_node prefix */
+char *fib2_print(int rtype, struct fib_node *); /* Prints human-readable fib_node prefix */
void fib_delete(struct fib *, void *); /* Remove fib entry */
void fib_free(struct fib *); /* Destroy the fib */
void fib_check(struct fib *); /* Consistency check for debugging */
@@ -75,6 +85,10 @@ void fit_init(struct fib_iterator *, struct fib *)
struct fib_node *fit_get(struct fib *, struct fib_iterator *);
void fit_put(struct fib_iterator *, struct fib_node *);
+#define FPREFIX_IP(n) ((ip_addr *)((n))->addr)
+#define FPREFIX(n) ((void *)((n))->addr)
+#define PROTO_FIB(x) ((x)->table->fib.addr_type)
+
#define FIB_WALK(fib, z) do { \
struct fib_node *z, **ff = (fib)->hash_table; \
unsigned int count = (fib)->hash_size; \
@@ -116,6 +130,7 @@ void fit_put(struct fib_iterator *, struct fib_nod
struct rtable_config {
node n;
char *name;
+ int rtype; /* table type (RT_IP, RT_VPN, ...) */
struct rtable *table;
struct proto_config *krt_attached; /* Kernel syncer attached to this table */
int gc_max_ops; /* Maximum number of operations before GC is run */
@@ -126,6 +141,7 @@ typedef struct rtable {
node n; /* Node in list of all tables */
struct fib fib;
char *name; /* Name of this table */
+ int rtype; /* Type of the table (IPv46, VPNv46, MPLS, etc..)*/
list hooks; /* List of announcement hooks */
int pipe_busy; /* Pipe loop detection */
int use_count; /* Number of protocols using this table */
@@ -179,6 +195,7 @@ struct hostentry {
typedef struct rte {
struct rte *next;
net *net; /* Network this RTE belongs to */
+ int rtype; /* RTE type: IP, MPLS, VPN, .. */
struct proto *sender; /* Protocol instance that sent the route to the routing table */
struct rta *attrs; /* Attributes of this route */
byte flags; /* Flags (REF_...) */
@@ -213,6 +230,11 @@ typedef struct rte {
#define REF_COW 1 /* Copy this rte on write */
+/* Types of routing tables/entries */
+#define RT_IP 1
+#define RT_VPN 2
+#define RT_MPLS 3
+
/* Types of route announcement, also used as flags */
#define RA_OPTIMAL 1 /* Announcement of optimal route change */
#define RA_ANY 2 /* Announcement of any route change */
@@ -240,7 +262,7 @@ void rt_dump_all(void);
int rt_feed_baby(struct proto *p);
void rt_feed_baby_abort(struct proto *p);
void rt_prune_all(void);
-struct rtable_config *rt_new_table(struct symbol *s);
+struct rtable_config *rt_new_table(struct symbol *s, int rtype);
struct rt_show_data {
ip_addr prefix;
Index: nest/rt-table.c
===================================================================
--- nest/rt-table.c (revision 4962)
+++ nest/rt-table.c (working copy)
@@ -66,6 +66,9 @@ net_route(rtable *tab, ip_addr a, int len)
ip_addr a0;
net *n;
+ if (tab->fib.addr_type != RT_IP)
+ return NULL;
+
while (len >= 0)
{
a0 = ipa_and(a, ipa_mkmask(len));
@@ -111,7 +114,7 @@ rte_find(net *net, struct proto *p)
*
* Create a temporary &rte and bind it with the attributes @a.
* Also set route preference to the default preference set for
- * the protocol.
+ * the protocol. RT_IP route type is assumed by default
*/
rte *
rte_get_temp(rta *a)
@@ -121,6 +124,7 @@ rte_get_temp(rta *a)
e->attrs = a;
e->flags = 0;
e->pref = a->proto->preference;
+ e->rtype = RT_IP;
return e;
}
@@ -166,7 +170,7 @@ rte_trace(struct proto *p, rte *e, int dir, char *
byte via[STD_ADDRESS_P_LENGTH+32];
rt_format_via(e, via);
- log(L_TRACE "%s %c %s %I/%d %s", p->name, dir, msg, e->net->n.prefix, e->net->n.pxlen, via);
+ log(L_TRACE "%s %c %s %s %s", p->name, dir, msg, fib2_print(e->rtype, &e->net->n), via);
}
static inline void
@@ -367,23 +371,27 @@ rte_announce(rtable *tab, unsigned type, net *net,
static inline int
-rte_validate(rte *e)
+rte_validate(struct fib *f, rte *e)
{
int c;
net *n = e->net;
- if ((n->n.pxlen > BITS_PER_IP_ADDRESS) || !ip_is_prefix(n->n.prefix,n->n.pxlen))
+ /* Do not bother checking non-IP routes at the moment */
+ if (f->addr_type != RT_IP)
+ return 1;
+
+ if ((n->n.pxlen > BITS_PER_IP_ADDRESS) || !ip_is_prefix(*FPREFIX_IP(&n->n),n->n.pxlen))
{
- log(L_WARN "Ignoring bogus prefix %I/%d received via %s",
- n->n.prefix, n->n.pxlen, e->sender->name);
+ log(L_WARN "Ignoring bogus prefix %s received via %s",
+ fib2_print(e->rtype, &n->n), e->sender->name);
return 0;
}
- c = ipa_classify_net(n->n.prefix);
+ c = ipa_classify_net(*FPREFIX_IP(&n->n));
if ((c < 0) || !(c & IADDR_HOST) || ((c & IADDR_SCOPE_MASK) <= SCOPE_LINK))
{
- log(L_WARN "Ignoring bogus route %I/%d received via %s",
- n->n.prefix, n->n.pxlen, e->sender->name);
+ log(L_WARN "Ignoring bogus route %s received via %s",
+ fib2_print(e->rtype, &n->n), n->n.pxlen, e->sender->name);
return 0;
}
@@ -453,8 +461,8 @@ rte_recalculate(rtable *table, net *net, struct pr
{
if (new)
{
- log(L_ERR "Pipe collision detected when sending %I/%d to table %s",
- net->n.prefix, net->n.pxlen, table->name);
+ log(L_ERR "Pipe collision detected when sending %s to table %s",
+ fib2_print(old->rtype, &net->n), table->name);
rte_free_quick(new);
}
return;
@@ -672,7 +680,7 @@ rte_update(rtable *table, net *net, struct proto *
#endif
stats->imp_updates_received++;
- if (!rte_validate(new))
+ if (!rte_validate(&table->fib, new))
{
rte_trace_in(D_FILTERS, p, new, "invalid");
stats->imp_updates_invalid++;
@@ -750,7 +758,7 @@ rte_dump(rte *e)
{
net *n = e->net;
if (n)
- debug("%-1I/%2d ", n->n.prefix, n->n.pxlen);
+ debug("%-1I/%2d ", *FPREFIX_IP(&n->n), n->n.pxlen);
else
debug("??? ");
debug("KF=%02x PF=%02x pref=%d lm=%d ", n->n.flags, e->pflags, e->pref, now-e->lastmod);
@@ -773,7 +781,7 @@ rt_dump(rtable *t)
net *n;
struct announce_hook *a;
- debug("Dump of routing table <%s>\n", t->name);
+ debug("Dump of routing table <%s>:%d\n", t->name, t->fib.addr_type);
#ifdef DEBUGGING
fib_check(&t->fib);
#endif
@@ -848,11 +856,17 @@ rt_event(void *ptr)
rt_prune(tab);
}
+
+/**
+ * rt_setup - initialize RT_IP routing table
+ *
+ * This function is called to set up rtable (hooks, lists, fib, ..)
+ */
void
rt_setup(pool *p, rtable *t, char *name, struct rtable_config *cf)
{
bzero(t, sizeof(*t));
- fib_init(&t->fib, p, sizeof(net), 0, rte_init);
+ fib2_init(&t->fib, p, sizeof(net), RT_IP, sizeof(ip_addr), 0, rte_init, NULL);
t->name = name;
t->config = cf;
init_list(&t->hooks);
@@ -953,7 +967,7 @@ rt_preconfig(struct config *c)
struct symbol *s = cf_find_symbol("master");
init_list(&c->tables);
- c->master_rtc = rt_new_table(s);
+ c->master_rtc = rt_new_table(s, RT_IP);
}
@@ -1098,12 +1112,13 @@ rt_next_hop_update(rtable *tab)
struct rtable_config *
-rt_new_table(struct symbol *s)
+rt_new_table(struct symbol *s, int rtype)
{
struct rtable_config *c = cfg_allocz(sizeof(struct rtable_config));
cf_define_symbol(s, SYM_TABLE, c);
c->name = s->name;
+ c->rtype = rtype;
add_tail(&new_config->tables, &c->n);
c->gc_max_ops = 1000;
c->gc_min_time = 5;
@@ -1461,7 +1476,7 @@ rt_notify_hostcache(rtable *tab, net *net)
if (tab->hcu_scheduled)
return;
- if (trie_match_prefix(hc->trie, net->n.prefix, net->n.pxlen))
+ if (trie_match_prefix(hc->trie, *FPREFIX_IP(&net->n), net->n.pxlen))
rt_schedule_hcu(tab);
}
@@ -1512,6 +1527,8 @@ rt_update_hostentry(rtable *tab, struct hostentry
rta *old_src = he->src;
int pxlen = 0;
+ /* XXX: check for non-IP address families ? */
+
/* Reset the hostentry */
he->src = NULL;
he->gw = IPA_NONE;
@@ -1527,8 +1544,8 @@ rt_update_hostentry(rtable *tab, struct hostentry
if (a->hostentry)
{
/* Recursive route should not depend on another recursive route */
- log(L_WARN "Next hop address %I resolvable through recursive route for %I/%d",
- he->addr, n->n.prefix, pxlen);
+ log(L_WARN "Next hop address %I resolvable through recursive route for %s",
+ he->addr, fib2_print(tab->fib.addr_type, &n->n));
goto done;
}
@@ -1675,13 +1692,11 @@ rt_show_rte(struct cli *c, byte *ia, rte *e, struc
}
static void
-rt_show_net(struct cli *c, net *n, struct rt_show_data *d)
+rt_show_net(struct cli *c, struct fib *f, net *n, struct rt_show_data *d)
{
rte *e, *ee;
- byte ia[STD_ADDRESS_P_LENGTH+8];
int ok;
- bsprintf(ia, "%I/%d", n->n.prefix, n->n.pxlen);
if (n->routes)
d->net_counter++;
for(e=n->routes; e; e=e->next)
@@ -1717,8 +1732,7 @@ static void
{
d->show_counter++;
if (d->stats < 2)
- rt_show_rte(c, ia, e, d, tmpa);
- ia[0] = 0;
+ rt_show_rte(c, fib2_print(f->addr_type, &n->n), e, d, tmpa);
}
if (e != ee)
{
@@ -1763,7 +1777,7 @@ rt_show_cont(struct cli *c)
FIB_ITERATE_PUT(it, f);
return;
}
- rt_show_net(c, n, d);
+ rt_show_net(c, fib, n, d);
}
FIB_ITERATE_END(f);
if (d->stats)
@@ -1803,7 +1817,7 @@ rt_show(struct rt_show_data *d)
n = net_find(d->table, d->prefix, d->pxlen);
if (n)
{
- rt_show_net(this_cli, n, d);
+ rt_show_net(this_cli, &d->table->fib, n, d);
cli_msg(0, "");
}
else
Index: nest/rt-fib.c
===================================================================
--- nest/rt-fib.c (revision 4962)
+++ nest/rt-fib.c (working copy)
@@ -73,9 +73,15 @@ fib_ht_free(struct fib_node **h)
}
static inline unsigned
-fib_hash(struct fib *f, ip_addr *a)
+fib_hash(struct fib *f, void *a)
{
- return ipa_hash(*a) >> f->hash_shift;
+ if (f->hash_f)
+ return f->hash_f(a);
+
+ if (f->addr_type == RT_IP)
+ return ipa_hash(*((ip_addr *)a)) >> f->hash_shift;
+
+ return 0;
}
static void
@@ -98,16 +104,42 @@ fib_dummy_init(struct fib_node *dummy UNUSED)
void
fib_init(struct fib *f, pool *p, unsigned node_size, unsigned hash_order, fib_init_func init)
{
+ fib2_init(f, p, node_size, RT_IP, sizeof(ip_addr), hash_order, init, NULL);
+}
+
+/**
+ * fib2_init - initialize a new FIB
+ * @f: the FIB to be initialized (the structure itself being allocated by the caller)
+ * @p: pool to allocate the nodes in
+ * @node_size: total node size to be used (each node consists of a standard header &fib_node
+ * followed by user data)
+ * @addr_type: type of addresses stored in fib (RT_*)
+ * @addr_size: size of address data
+ * @hash_order: initial hash order (a binary logarithm of hash table size), 0 to use default order
+ * (recommended)
+ * @init: pointer a function to be called to initialize a newly created node
+ * @hash_p: optional pointer a function to be called to hash node
+ *
+ * This function initializes a newly allocated FIB and prepares it for use.
+ */
+void
+fib2_init(struct fib *f, pool *p, unsigned node_size, unsigned int addr_type, unsigned int addr_size, \
+ unsigned hash_order, fib_init_func init, fib_hash_func hash_f)
+{
if (!hash_order)
hash_order = HASH_DEF_ORDER;
f->fib_pool = p;
- f->fib_slab = sl_new(p, node_size);
+ f->fib_slab = sl_new(p, node_size + addr_size);
f->hash_order = hash_order;
fib_ht_alloc(f);
bzero(f->hash_table, f->hash_size * sizeof(struct fib_node *));
+ f->addr_type = addr_type;
+ f->addr_size = addr_size;
+ f->node_size = node_size;
f->entries = 0;
f->entries_min = 0;
f->init = init ? : fib_dummy_init;
+ f->hash_f = hash_f;
}
static void
@@ -133,7 +165,7 @@ fib_rehash(struct fib *f, int step)
while (e = x)
{
x = e->next;
- nh = fib_hash(f, &e->prefix);
+ nh = fib_hash(f, FPREFIX(e));
while (nh > ni)
{
*t = NULL;
@@ -163,11 +195,11 @@ fib_rehash(struct fib *f, int step)
* a pointer to it or %NULL if no such node exists.
*/
void *
-fib_find(struct fib *f, ip_addr *a, int len)
+fib_find(struct fib *f, void *a, int len)
{
struct fib_node *e = f->hash_table[fib_hash(f, a)];
- while (e && (e->pxlen != len || !ipa_equal(*a, e->prefix)))
+ while (e && (e->pxlen != len || memcmp(a, FPREFIX(e), f->addr_size)))
e = e->next;
return e;
}
@@ -197,26 +229,26 @@ fib_histogram(struct fib *f)
/**
* fib_get - find or create a FIB node
* @f: FIB to work with
- * @a: pointer to IP address of the prefix
- * @len: prefix length
+ * @a: pointer to IP (or other family) address of the prefix
+ * @len: prefix length (if address family requires)
*
* Search for a FIB node corresponding to the given prefix and
* return a pointer to it. If no such node exists, create it.
*/
void *
-fib_get(struct fib *f, ip_addr *a, int len)
+fib_get(struct fib *f, void *a, int len)
{
- unsigned int h = ipa_hash(*a);
- struct fib_node **ee = f->hash_table + (h >> f->hash_shift);
+ unsigned int h = fib_hash(f, a);
+ struct fib_node **ee = f->hash_table + h;
struct fib_node *g, *e = *ee;
- u32 uid = h << 16;
+ u32 uid = h << (16 + f->hash_shift);
- while (e && (e->pxlen != len || !ipa_equal(*a, e->prefix)))
+ while (e && (e->pxlen != len || memcmp(a, FPREFIX(e), f->addr_size)))
e = e->next;
if (e)
return e;
#ifdef DEBUGGING
- if (len < 0 || len > BITS_PER_IP_ADDRESS || !ip_is_prefix(*a,len))
+ if ((f->addr_type == RT_IP) && (len < 0 || len > BITS_PER_IP_ADDRESS || !ip_is_prefix(*((ip_addr *)a),len)))
bug("fib_get() called for invalid address");
#endif
@@ -228,13 +260,14 @@ void *
uid++;
}
- if ((uid >> 16) != h)
+ if ((uid >> (16 + f->hash_shift)) != h)
log(L_ERR "FIB hash table chains are too long");
// log (L_WARN "FIB_GET %I %x %x", *a, h, uid);
e = sl_alloc(f->fib_slab);
- e->prefix = *a;
+ e->addr = (char *)e + f->node_size;
+ memcpy(e->addr, a, f->addr_size);
e->pxlen = len;
e->next = *ee;
e->uid = uid;
@@ -250,22 +283,25 @@ void *
/**
* fib_route - CIDR routing lookup
* @f: FIB to search in
- * @a: pointer to IP address of the prefix
- * @len: prefix length
+ * @a: pointer to IP (or other family) address of the prefix
+ * @len: prefix length (if address family requires)
*
* Search for a FIB node with longest prefix matching the given
* network, that is a node which a CIDR router would use for routing
- * that network.
+ * that network. Function should be called for IPv4/IPv6 routes only
*/
void *
-fib_route(struct fib *f, ip_addr a, int len)
+fib_route(struct fib *f, ip_addr *a, int len)
{
ip_addr a0;
void *t;
+ if (f->addr_type != RT_IP)
+ return NULL;
+
while (len >= 0)
{
- a0 = ipa_and(a, ipa_mkmask(len));
+ a0 = ipa_and(*a, ipa_mkmask(len));
t = fib_find(f, &a0, len);
if (t)
return t;
@@ -321,7 +357,7 @@ void
fib_delete(struct fib *f, void *E)
{
struct fib_node *e = E;
- unsigned int h = fib_hash(f, &e->prefix);
+ unsigned int h = fib_hash(f, FPREFIX(e));
struct fib_node **ee = f->hash_table + h;
struct fib_iterator *it;
@@ -413,7 +449,7 @@ fit_get(struct fib *f, struct fib_iterator *i)
if (k = i->next)
k->prev = j;
j->next = k;
- i->hash = fib_hash(f, &n->prefix);
+ i->hash = fib_hash(f, FPREFIX(n));
return n;
}
@@ -430,6 +466,54 @@ fit_put(struct fib_iterator *i, struct fib_node *n
i->prev = (struct fib_iterator *) n;
}
+/**
+ * fib_print - prints a FIB node
+ * @n: pointer to fib_node structure
+ *
+ * This function prints fib node address to static buffer and
+ * returns it to the caller. Up to PBUFS(4) different buffers are
+ * available. RT_IP address type is assumed
+ */
+char *
+fib_print(struct fib_node *n)
+{
+ return fib2_print(0, n);
+}
+#define PBUFS 4
+#define PSIZE 50
+/**
+ * fib2_print - prints a FIB node
+ * @rtype: address type
+ * @n: pointer to fib_node structure
+ *
+ * This function prints fib node address to static buffer and
+ * returns it to the caller. Up to PBUFS(4) different buffers are
+ * available.
+ */
+char *
+fib2_print(int rtype, struct fib_node *n)
+{
+ static int cntr;
+ static char buf[PBUFS][PSIZE];
+ char *x;
+
+ x = buf[cntr++ % PBUFS];
+ if (rtype == 0)
+ rtype = RT_IP;
+
+ switch (rtype)
+ {
+ case RT_IP:
+ bsnprintf(x, PSIZE, "%I/%d", *FPREFIX_IP(n), n->pxlen);
+ break;
+
+ default:
+ bsnprintf(x, PSIZE, "RT:%d", rtype);
+ }
+
+ return x;
+}
+
#ifdef DEBUGGING
/**
@@ -452,7 +536,7 @@ fib_check(struct fib *f)
for(n=f->hash_table[i]; n; n=n->next)
{
struct fib_iterator *j, *j0;
- unsigned int h0 = ipa_hash(n->prefix);
+ unsigned int h0 = fib_hash(f, FPREFIX(n));
if (h0 < lo)
bug("fib_check: discord in hash chains");
lo = h0;
@@ -491,14 +575,15 @@ void dump(char *m)
{
unsigned int i;
- debug("%s ... order=%d, size=%d, entries=%d\n", m, f.hash_order, f.hash_size, f.hash_size);
+ debug("%s ... type=%d order=%d, size=%d, entries=%d\n", m, f.addr_type, f.hash_order, f.hash_size, f.hash_size);
for(i=0; i<f.hash_size; i++)
{
struct fib_node *n;
struct fib_iterator *j;
for(n=f.hash_table[i]; n; n=n->next)
{
- debug("%04x %04x %p %I/%2d", i, ipa_hash(n->prefix), n, n->prefix, n->pxlen);
+ debug("%04x %04x %p %s", i, fib_hash(&f, FPREFIX(n)) << f->hash_shift, n, fib_print(&f, n));
+
for(j=n->readers; j; j=j->next)
debug(" %p[%p]", j, j->node);
debug("\n");
Index: nest/config.Y
===================================================================
--- nest/config.Y (revision 4962)
+++ nest/config.Y (working copy)
@@ -107,7 +107,7 @@ listen_opt:
CF_ADDTO(conf, newtab)
newtab: TABLE SYM {
- rt_new_table($2);
+ rt_new_table($2, RT_IP);
}
;
It is sad to see how mutch effort there is for causing new problems and
bugs buy trying merge chicken and dog to gether. In our system is over
50 routers and over 500 routers via ospf. There is still buntch of bugs
what cause un sync problems and domino effects and need to be fix. Two
separated engine was purely good thing when we speak production
networks. Actually on many large production networks ipv4 aind ipv6
routers are on different machines for limiting problems.
31.7.2011 11:38, Alexander V. Chernikov kirjoitti:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexander V. Chernikov wrote:
>> On 22.07.2011 14:52, Ondrej Zajicek wrote:
>>> On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
>>>>> Therefore there would be two types of routing tables - IP and MPLS. I
>>>>> don't think it is a good idea to mix these. This may look inconsistent
>>>>> with idea of embedding IPv4 to IPv6, but IP protocols are much more
>>>>> similar, have a natural way to embed one in the other, have similar
>>>>> roles and protocol structure. MPLS routing table could be used to LDP -
>>>>> kernel interaction (routes imported from LDP and exported to kernel).
>>>>> This solves your Case 2 without any hacks.
>>>> So, from user point of view, I define
>>>> table xxx; for both ipv4 and IPv6 routes and
>>>> mpls table yyy; for MPLS routing table?
>>> Yes.
> Patch permitting fibs to be used for any address family attached.
> It should be considered as PoC patch for review. It works for my setup,
> but I haven't tested it in production. netlink is not tested at all.
>
> Some notes:
> * fib has to have address type field (due to fib_get and other functions
> using pointer to fib, not rtable)
> * Due to address variable length we store it inside fib node this way:
>
> |--------------------|
> | struct fib_node |
> | *addr --------\
> |--------------------| |
> | some user data | |
> | | |
> |--------------------| |
> | address data<-------/
> | |
> |--------------------|
> * Since we've got pointer to address data instead of data (ip_addr)
> itself, all 9000 places with "%I/%d" needs to be changed, so more
> general fib_print and fib2_print functions are implemented
>
> * Several net_* calls were converted to fib_*
>
>
>
> Btw, some IPv4/IPv6 merging questions/thoughts:
> * show route will show complete mess for table with both v4 and v6
> routes. Some sorting or 'afi ipv4|ipv6' has to be implemented.
> * fill_in_sockaddr|get_sockaddr from io.c are somehow inconsequent:
> fill_* uses OS-dependent set_inaddr to fill actual address data but
> get_* uses direct calls to memcpy and ipa_ntoh instead of existing
> OS-dependent get_inaddr. Moreover, set_ and get_ implementations are the
> same for linux, bsd (and they should be the same for other UNIX-like
> systems AFAIR, at least for IPv4/IPv6)
>
>
>
>>>> There should be base MPLS rtable (mpls_default, for example) as in IP.
>>>> We can also add a hack for automatically subscribe protocols for MPLS
>>>> routing table by type and other attributes. For example, every LDP
>>>> instance gets connected to an MPLS table (default or defined in config).
>>>> Kernel protocol instance gets connected to MPLS table only if its IP
>>>> table is the default one (GRT) or 'mpls table' keyword is supplied
>>>> explicitely. What about VPNv4/VPNv6 ? The same approach?
>>> Perhaps even default MPLS table should be explicitly configured [*]
>>> (as i guess
>>> not many BIRD users would use MPLS). Protocols requiring MPLS table would
>>> fail if it is not configured, protocol with optional MPLS support
>>> (kernel,
>>> static?) just do not connect to MPLS in that case. The same approach
>>> for VPNvX table.
>>>
>>> [*] probably like: mpls table XXX default;
>> Maybe it's better to turn on "general" mpls support?
>> e.g. 'mpls support;' or just 'mpls;' instead of propagating some table
>> to be default?
>>>> Btw, how we will distinguish inet/inet6 rtes? (I'm talking about
>>>> MP-BGP
>>>> / IPv4-mapped cases)
>>> I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
>>> similar purposes in IP stack. But this should not be checked directly
>>> in protocols, there should be some macros in lib/ipv6.h for that.
>>>
>>>>> [*] when i wrote that i thought that labels are distributed just by LDP
>>>>> and the purpose of label request is to propagate the label through LDP
>>>>> area. i didn't noticed that BGP/MPLS also distributes labels so they
>>>>> need to know assigned labels. So the idea would need some
>>>>> modifications.
>>>> Not sure this will work. Since t1 is an IP table cases when we need to
>>>> request specific label for:
>>>> * AToM
>>>> * RSVP-TE tunnels
>>>> will not work since there are no prefixes that can be mapped to such
>>>> request.
>>> You are probably right. I originally thought about some specific
>>> 'request table' (where requests coded as routes with specific AF),
>>> but perhaps there should be used some other mechanism / other protocol
>>> hook. But it should be generic enough (some bus, allows at least more
>>> 'producers' and perhaps more 'consumers').
>> Okay, i see this as follows:
>> New rtable hook, service_hook, with uint32_3 bitmask specifying request
>> classes we are responsible to:
>> /* Defined classes */
>> #define RCLASS_LABEL 0x01 /* MPLS label request */
>>
>> Some request function:
>> int
>> request_data(rtable *t, struct service_request *req, void **buf, size_t
>> *bufsize)
>>
>> struct service_request {
>> uint32_t request; /* Single request class set */
>> uint32_t subclass; /* Subclass specific for request */
>> proto *p; /* caller protocol */
>> char data[0]; /* request-specific data follows */
>> }
>>
>> function loops thru all registered hooks for given _class_ checking for
>> reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to
>> check subclass.
>> #define SR_OK 0x01 /* Request successful */
>> #define SR_FAIL 0x02 /* Request failed */
>> #define SR_NEXT 0x03 /* Request skipped */
>> #define SR_UNAVAIL 0x04 /* No providers for this request */
>>
>> As a result, caller get SR_UNAVAIL in case of no providers were able to
>> serve request or SR_OK|SR_FAIL.
>>
>> caller can setup buffer itself and pass pointer to pointer to buffer and
>> pointer to buffer size to function, or request provider to allocate data
>> for him setting *buf to NULL and bufsize to 0
>>
>> struct service_reply { /* is returned in reply buffer */
>> uint32_t request;
>> uint32_t subclass;
>> proto *p; /* protocol, providing data */
>> char data[0]; /* request-specific data */
>> }
>>
>>
>>
>>>>> Internal LMAP table is examined, tracked IGP table is examined. If both
>>>>> are ready (for given prefix), appropriate encapsulating and MPLS routes
>>>>> are generated and propagated using rte_update(), otherwise nothing is
>>>>> generated and the previously generated route is withdrawn (rte_update()
>>>>> with NULL is called) (or perhaps an unreachable route is generated if
>>>>> LMAP is here but IGP route is missing). Simple and elegant.
>>>> .. and in case of label release we should remove label only and keep
>>>> original route
>>> Yes.
>>>
>>>>> There are some tricky parts of IGP tracking - it is problematic
>>>>> to use standard RA_OPTIMAL update for this purpose, because if
>>>>> generated encapsulating routes are imported to the same table,
>>>>> these probably became the optimal ones and IGP routes would be
>>>>> shaded. Solution would be to use RA_ANY, and ignore notifications
>>>>> containing encapsulating routes, similarly 'examining the tracked
>>>>> IGP table' means looking up the fib node and find the best route,
>>>>> ignoring encapsulating ones.
>>>>>
>>>>> For implementation of this behavior, there are two minor changes that
>>>>> needs to be done to the rt table code: First, currently accept_ra_types
>>>>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
>>>>> property of an announce hook (as LDP would have two hooks with
>>>>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
>>>>> both in rte_recalculate should be moved after the route list
>>>>> is updated/relinked.
>>>> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
>>>> trivial task and requires internals understanding. Either announce type
>>>> should be passed to announce hook or new hook should be added for RA_ANY
>>>> event. The latter is more appropriate IMHO since RA_ANY is used by
>>>> pipe
>>>> protocol only.
>>> I thought about that when i created RA_ANY and have chosen this approach.
>>> Probably best way is just to change rt_notify to have appropriate
>>> struct announce_hook as a second argument instead of struct rtable.
>>> struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
>>> some protocol-specific data. As (probably) all protocols are in-tree,
>>> doing some wide but trivial changes is not a problem.
>>>
>>>> Kernel protocol should track RA_ANY protocol hooks
>>>> looking for update source (LDP / RSVP) and re-install appropriate
>>>> routes.
>>> I think kernel protocol should use RA_OPTIMAL as usual. This kind
>>> of RA_ANY usage is for protocols that export routes to the same
>>> table they listen (so 'source' routes would be shaded by their
>>> routes). These routes (LDP / RSVP) should have just highest
>>> priority.
>>>
>>>> The only downside is situation when LDP signalling starts faster
>>>> than IGP. In that case we will get 3 updates instead of one (at least in
>>>> RTSOCK):
>>>> * RTM_ADD for original prefix
>>>> * RTM_DEL for this prefix (as part of krt_set_notify())
>>>> * RTM_ADD for modified prefix
>>>>
>>>> RTM_CHANGE can be used in notify, but still: this gives 2 updates
>>>> instead of one.
>>> No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
>>> are propagated synchronously depth-first:
>>>
>>> OSPF --RA_ANY--> LDP
>>> LDP --RA_OPTIMAL--> kernel
>>> OSPF --RA_OPTIMAL--> kernel
>>>
>> Still I can't understand how exactly I can modify an announced IP route
>> (still, from FreeBSD kernel point of view encapsulated route is a usual
>> route with an attribute attached. From Linux point of view this should
>> be more or less the same since an IP route lookup have to be done for
>> incoming packet anyway and doing several different lookups is not a best
>> idea). I've got RA_ANY hook called for a new route (and I should know
>> that it is actually RA_OPTIMAL without some complex logic!), what I
>> should do next ?
>>
>>> But it is true that this is much dependent on internal implementation
>>> of route propagation. The first idea i had was to use separate
>>> tables for original and labeled routes (when just RA_OPTIMAL hooks),
>>> but that looks too cumbersome for users and ability to push a better
>>> route to the same (input) table has other possible usages.
>>>
>>>>> Therefore, it is probably a good idea to extend FIBs in a way you
>>>>> suggested, with minor details changed. FIB / rtables would be uniform
>>>>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and
>>>>> IPv6
>>>>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
>>>>> minimize code changes, struct fib_node would have ip_addr prefix, but
>>>>> might be allocated larger.
>>>> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
>>>> enough for holding IPv6 address? This can bump memory consumption for
>>>> setups with several full-views significantly.
>>> It increases memory consumtion, but not so much in a relative view - for
>>> each struct network there is at least one struct rte and in both of them
>>> there is just one ip_addr and both structures are nontrivial. So this
>>> relative increase would be about 1.15-1.2. Really big users would
>>> probably keep current splitted setting.
>> Okay, it's much easier from developer point of view. If you're not
>> afraid of your users :)
>>>>> Because each protocol and each its announce_hook have appropriate role,
>>>>> it is IMHO unnecessary to have AF in protocol hooks, but there
>>>>> should be
>>>>> check whether protocol/announce_hook is connected to appropriate
>>>>> rtable.
>>>>>
>>>> To summarize required changes (please correct me):
>>>> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
>>>> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
>>>> * rtable
>>>> * fib
>>>> * rte
>>>> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
>>>> to struct fib to hold this value.
>>>> 4) Move to memcmp() in fib_find / fib_get
>>>> 5) Set up default rtable for every supported AF. Connect protocol
>>>> instances to such default AFs based on protocol types
>>> 1a) other changes in rte_recalculate() related to propagation
>>> (clean up the table before calling RA_ANY hook).
>>>
>>> 1) and 1a) i will do myself and send you the patch, and also make
>>> some trivial example for exporting to the same table.
>>>
>>> 2) i am not sure if there is a reason to put explicit AF info
>>> to struct fib, AF compatibility could be handled on higher level
>>> (struct rtable in general, other direct users probably use just
>>> one AF).
>> No problem, I misinterpreted "FIB / rtables would be uniform (AF_
>> bound)" as "FIB / rtable needs AF infor in structure fields"
>>> 3) and hashing callback (and perhaps fib_route, but not sure if this is
>>> needed).
>>>
>>> 4) probably encapsulate that to some static inline key_equal() function.
>>>
>>> 5) see my related note above. Protocol binding to tables should check
>>> AFs.
>>>
>>> more:
>>>
>>> 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous
>>> mail:
>>>
>>>>> i think encapsulation
>>>>> routes should be represented by routes with new destination type
>>>>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
>>>>> in new struct rta_mpls (or rta_nhlfe), which would be extension of
>>>>> struct rta (containing struct rta in the first field and NHLFE after
>>>>> that). Such structure could be easily passed as struct rta and
>>>>> functions
>>>>> from rt-attr.c can work with that, with jome some minor modifications
>>>>> (allocating, freeing and printing) dispatched based on dest field.
>>>>> This rta could be used without changes also for MPLS routes.
>> I'll try to send you patches for all these as I see it in several days.
>>>
>>>> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
>>>> can be used in case of bird used as RR in MPLS network, for example).
>>>> Should I supply patches for these? What are your plans about commit
>>>> routemap ?
>>> I create GIT branch 'mpls' and would merge these patches to that branch
>>> soon. When we will have some major release, we could merge 'mpls' branch
>>> to master if there is some sufficient usage (i think that even just
>>> static and kernel protocol support for MPLS would be a good example
>>> usage). Other protocols (LDP, ...) probably should be merged when they
>>> are reasonable ready.
>> Will this branch available from official git repo ? It is not accessible
>> (from its web interface at least).
>>
>>
>> Btw, some bird/LDP "status" report:
>>
>> bird> show ldp neighbour
>> Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0
>> TCP connection: 10.2.33.4.11212 - 0.0.0.0.0
>> State: Operational; Msgs sent/rcvd: 21/61; Downstream
>> Up time: 00:02:27
>> LDP discovery sources:
>> em1, Src IP addr: 10.1.5.4
>> Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0
>> TCP connection: 10.2.33.3.11009 - 0.0.0.0.0
>> State: Operational; Msgs sent/rcvd: 29/60; Downstream
>> Up time: 00:02:20
>> LDP discovery sources:
>> em2, Src IP addr: 10.1.6.3
>> bird> show ldp bindings
>> lib entry: 10.2.0.0/30
>> local binding: label: 25
>> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
>> remote binding: lsr: 10.2.33.3:0, label: 23
>> lib entry: 10.1.6.0/24
>> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
>> remote binding: lsr: 10.2.33.4:0, label: 25
>> lib entry: 10.0.0.0/24
>> remote binding: lsr: 10.2.33.3:0, label: 19
>> remote binding: lsr: 10.2.33.4:0, label: 23
>> lib entry: 10.2.0.2/32
>> local binding: label: 26
>> remote binding: lsr: 10.2.33.4:0, label: 16
>> remote binding: lsr: 10.2.33.3:0, label: 24
>> lib entry: 10.1.4.0/24
>> local binding: label: 29
>> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
>> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
>> lib entry: 10.1.5.0/24
>> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
>> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
>> lib entry: 1.2.3.5/32
>> remote binding: lsr: 10.2.33.3:0, label: 20
>> remote binding: lsr: 10.2.33.4:0, label: 21
>> lib entry: 10.1.33.0/24
>> local binding: label: 28
>> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
>> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
>> lib entry: 10.2.33.3/32
>> local binding: label: 31
>> remote binding: lsr: 10.2.33.3:0, label: ImpNULL
>> lib entry: 10.2.33.4/32
>> local binding: label: 27
>> remote binding: lsr: 10.2.33.4:0, label: ImpNULL
>> remote binding: lsr: 10.2.33.3:0, label: 25
>> lib entry: 10.1.6.88/32
>> remote binding: lsr: 10.2.33.3:0, label: 18
>> remote binding: lsr: 10.2.33.4:0, label: 19
>> lib entry: 10.0.0.88/32
>> remote binding: lsr: 10.2.33.4:0, label: 17
>> remote binding: lsr: 10.2.33.3:0, label: 16
>> lib entry: 10.1.5.88/32
>> remote binding: lsr: 10.2.33.3:0, label: 21
>> remote binding: lsr: 10.2.33.4:0, label: 18
>> bird> show ldp forwardingtable
>> Local Outgoing Prefix Bytes Label Outgoing Next Hop
>> Label Label or VC or Tunnel Id Switched interface
>> 20 SWAP 10.2.0.0/30 0 ? 10.1.5.4
>> 21 SWAP 10.2.0.2/32 0 ? 10.1.5.4
>> 22 SWAP 10.2.33.4/32 0 ? 10.1.5.4
>> 23 SWAP 10.1.33.0/24 0 ? 10.1.5.4
>> 24 SWAP 10.1.4.0/24 0 ? 10.1.5.4
>> 25 SWAP 10.2.0.0/30 0 ? 10.1.5.4
>> 26 SWAP 10.2.0.2/32 0 ? 10.1.5.4
>> 27 SWAP 10.2.33.4/32 0 ? 10.1.5.4
>> 28 SWAP 10.1.33.0/24 0 ? 10.1.5.4
>> 29 SWAP 10.1.4.0/24 0 ? 10.1.5.4
>> 30 SWAP 10.2.33.3/32 0 ? 10.1.6.3
>> 31 SWAP 10.2.33.3/32 0 ? 10.1.6.3
>>
>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.14 (FreeBSD)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk41FJEACgkQwcJ4iSZ1q2kZNwCfZHk19PuXn2esNZ/KrvXOir5v
> zTMAoKe78CsexI0pPJ4li50e8teBCcpa
> =yqPo
> -----END PGP SIGNATURE-----
--
Kaikki viestissä ilmoitetut summat ovat alvittomia, ellei toisin ole kyseisen summan yhteydessä ilmoitettu.
--
F-Solutions Oy
Tapio Haapala
PL 7, 90571 Oulu
GSM 040-0998371
Skype burner-
IRC Burner@ircnet
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tapio Haapala wrote:
It is sad to see how mutch effort there is for causing new problems and bugs buy trying merge chicken and dog to gether. In our system is over 50 routers and over 500 routers via ospf. There is still buntch of bugs what cause un sync problems and domino effects and need to be fix. Two separated engine was purely good thing when we speak production networks. Actually on many large production networks ipv4 aind ipv6 routers are on different machines for limiting problems. Actually this discussion (despite its topic) is mostly about providing prerequisites to implement some advanced SP features based on MPLS.
Even if bird and bird6 became single daemon no one can restrict you to use different bird instances (exactly as you do this now) with different configs if you want to do so.
31.7.2011 11:38, Alexander V. Chernikov kirjoitti: Alexander V. Chernikov wrote:
On 22.07.2011 14:52, Ondrej Zajicek wrote:
On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
> Therefore there would be two types of routing tables - IP and MPLS. I > don't think it is a good idea to mix these. This may look > inconsistent > with idea of embedding IPv4 to IPv6, but IP protocols are much more > similar, have a natural way to embed one in the other, have similar > roles and protocol structure. MPLS routing table could be used to > LDP - > kernel interaction (routes imported from LDP and exported to kernel). > This solves your Case 2 without any hacks. So, from user point of view, I define table xxx; for both ipv4 and IPv6 routes and mpls table yyy; for MPLS routing table? Yes. Patch permitting fibs to be used for any address family attached. It should be considered as PoC patch for review. It works for my setup, but I haven't tested it in production. netlink is not tested at all.
Some notes: * fib has to have address type field (due to fib_get and other functions using pointer to fib, not rtable) * Due to address variable length we store it inside fib node this way:
|--------------------| | struct fib_node | | *addr --------\ |--------------------| | | some user data | | | | | |--------------------| | | address data<-------/ | | |--------------------| * Since we've got pointer to address data instead of data (ip_addr) itself, all 9000 places with "%I/%d" needs to be changed, so more general fib_print and fib2_print functions are implemented
* Several net_* calls were converted to fib_*
Btw, some IPv4/IPv6 merging questions/thoughts: * show route will show complete mess for table with both v4 and v6 routes. Some sorting or 'afi ipv4|ipv6' has to be implemented. * fill_in_sockaddr|get_sockaddr from io.c are somehow inconsequent: fill_* uses OS-dependent set_inaddr to fill actual address data but get_* uses direct calls to memcpy and ipa_ntoh instead of existing OS-dependent get_inaddr. Moreover, set_ and get_ implementations are the same for linux, bsd (and they should be the same for other UNIX-like systems AFAIR, at least for IPv4/IPv6)
There should be base MPLS rtable (mpls_default, for example) as in IP. We can also add a hack for automatically subscribe protocols for MPLS routing table by type and other attributes. For example, every LDP instance gets connected to an MPLS table (default or defined in config). Kernel protocol instance gets connected to MPLS table only if its IP table is the default one (GRT) or 'mpls table' keyword is supplied explicitely. What about VPNv4/VPNv6 ? The same approach? Perhaps even default MPLS table should be explicitly configured [*] (as i guess not many BIRD users would use MPLS). Protocols requiring MPLS table would fail if it is not configured, protocol with optional MPLS support (kernel, static?) just do not connect to MPLS in that case. The same approach for VPNvX table.
[*] probably like: mpls table XXX default; Maybe it's better to turn on "general" mpls support? e.g. 'mpls support;' or just 'mpls;' instead of propagating some table to be default?
Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP / IPv4-mapped cases) I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for similar purposes in IP stack. But this should not be checked directly in protocols, there should be some macros in lib/ipv6.h for that.
> [*] when i wrote that i thought that labels are distributed just > by LDP > and the purpose of label request is to propagate the label through > LDP > area. i didn't noticed that BGP/MPLS also distributes labels so they > need to know assigned labels. So the idea would need some > modifications. Not sure this will work. Since t1 is an IP table cases when we need to request specific label for: * AToM * RSVP-TE tunnels will not work since there are no prefixes that can be mapped to such request. You are probably right. I originally thought about some specific 'request table' (where requests coded as routes with specific AF), but perhaps there should be used some other mechanism / other protocol hook. But it should be generic enough (some bus, allows at least more 'producers' and perhaps more 'consumers'). Okay, i see this as follows: New rtable hook, service_hook, with uint32_3 bitmask specifying request classes we are responsible to: /* Defined classes */ #define RCLASS_LABEL 0x01 /* MPLS label request */
Some request function: int request_data(rtable *t, struct service_request *req, void **buf, size_t *bufsize)
struct service_request { uint32_t request; /* Single request class set */ uint32_t subclass; /* Subclass specific for request */ proto *p; /* caller protocol */ char data[0]; /* request-specific data follows */ }
function loops thru all registered hooks for given _class_ checking for reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to check subclass. #define SR_OK 0x01 /* Request successful */ #define SR_FAIL 0x02 /* Request failed */ #define SR_NEXT 0x03 /* Request skipped */ #define SR_UNAVAIL 0x04 /* No providers for this request */
As a result, caller get SR_UNAVAIL in case of no providers were able to serve request or SR_OK|SR_FAIL.
caller can setup buffer itself and pass pointer to pointer to buffer and pointer to buffer size to function, or request provider to allocate data for him setting *buf to NULL and bufsize to 0
struct service_reply { /* is returned in reply buffer */ uint32_t request; uint32_t subclass; proto *p; /* protocol, providing data */ char data[0]; /* request-specific data */ }
> Internal LMAP table is examined, tracked IGP table is examined. If > both > are ready (for given prefix), appropriate encapsulating and MPLS > routes > are generated and propagated using rte_update(), otherwise nothing is > generated and the previously generated route is withdrawn > (rte_update() > with NULL is called) (or perhaps an unreachable route is generated if > LMAP is here but IGP route is missing). Simple and elegant. .. and in case of label release we should remove label only and keep original route Yes.
> There are some tricky parts of IGP tracking - it is problematic > to use standard RA_OPTIMAL update for this purpose, because if > generated encapsulating routes are imported to the same table, > these probably became the optimal ones and IGP routes would be > shaded. Solution would be to use RA_ANY, and ignore notifications > containing encapsulating routes, similarly 'examining the tracked > IGP table' means looking up the fib node and find the best route, > ignoring encapsulating ones. > > For implementation of this behavior, there are two minor changes that > needs to be done to the rt table code: First, currently > accept_ra_types > (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a > property of an announce hook (as LDP would have two hooks with > RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for > both in rte_recalculate should be moved after the route list > is updated/relinked. Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a trivial task and requires internals understanding. Either announce type should be passed to announce hook or new hook should be added for RA_ANY event. The latter is more appropriate IMHO since RA_ANY is used by pipe protocol only. I thought about that when i created RA_ANY and have chosen this approach. Probably best way is just to change rt_notify to have appropriate struct announce_hook as a second argument instead of struct rtable. struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly some protocol-specific data. As (probably) all protocols are in-tree, doing some wide but trivial changes is not a problem.
Kernel protocol should track RA_ANY protocol hooks looking for update source (LDP / RSVP) and re-install appropriate routes. I think kernel protocol should use RA_OPTIMAL as usual. This kind of RA_ANY usage is for protocols that export routes to the same table they listen (so 'source' routes would be shaded by their routes). These routes (LDP / RSVP) should have just highest priority.
The only downside is situation when LDP signalling starts faster than IGP. In that case we will get 3 updates instead of one (at least in RTSOCK): * RTM_ADD for original prefix * RTM_DEL for this prefix (as part of krt_set_notify()) * RTM_ADD for modified prefix
RTM_CHANGE can be used in notify, but still: this gives 2 updates instead of one. No, because RA_ANY is handled strictly before RA_OPTIMAL and routes are propagated synchronously depth-first:
OSPF --RA_ANY--> LDP LDP --RA_OPTIMAL--> kernel OSPF --RA_OPTIMAL--> kernel
Still I can't understand how exactly I can modify an announced IP route (still, from FreeBSD kernel point of view encapsulated route is a usual route with an attribute attached. From Linux point of view this should be more or less the same since an IP route lookup have to be done for incoming packet anyway and doing several different lookups is not a best idea). I've got RA_ANY hook called for a new route (and I should know that it is actually RA_OPTIMAL without some complex logic!), what I should do next ?
But it is true that this is much dependent on internal implementation of route propagation. The first idea i had was to use separate tables for original and labeled routes (when just RA_OPTIMAL hooks), but that looks too cumbersome for users and ability to push a better route to the same (input) table has other possible usages.
> Therefore, it is probably a good idea to extend FIBs in a way you > suggested, with minor details changed. FIB / rtables would be uniform > (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and > IPv6 > could be handled as one AF, embedded, the same for VPNv4 and > VPNv6). To > minimize code changes, struct fib_node would have ip_addr prefix, but > might be allocated larger. Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large enough for holding IPv6 address? This can bump memory consumption for setups with several full-views significantly. It increases memory consumtion, but not so much in a relative view - for each struct network there is at least one struct rte and in both of them there is just one ip_addr and both structures are nontrivial. So this relative increase would be about 1.15-1.2. Really big users would probably keep current splitted setting. Okay, it's much easier from developer point of view. If you're not afraid of your users :) > Because each protocol and each its announce_hook have appropriate > role, > it is IMHO unnecessary to have AF in protocol hooks, but there > should be > check whether protocol/announce_hook is connected to appropriate > rtable. > To summarize required changes (please correct me): 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly) 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures: * rtable * fib * rte 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field to struct fib to hold this value. 4) Move to memcmp() in fib_find / fib_get 5) Set up default rtable for every supported AF. Connect protocol instances to such default AFs based on protocol types 1a) other changes in rte_recalculate() related to propagation (clean up the table before calling RA_ANY hook).
1) and 1a) i will do myself and send you the patch, and also make some trivial example for exporting to the same table.
2) i am not sure if there is a reason to put explicit AF info to struct fib, AF compatibility could be handled on higher level (struct rtable in general, other direct users probably use just one AF). No problem, I misinterpreted "FIB / rtables would be uniform (AF_ bound)" as "FIB / rtable needs AF infor in structure fields" 3) and hashing callback (and perhaps fib_route, but not sure if this is needed).
4) probably encapsulate that to some static inline key_equal() function.
5) see my related note above. Protocol binding to tables should check AFs.
more:
6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail:
> i think encapsulation > routes should be represented by routes with new destination type > (RTD_MPLS in dest field of struct rta) and whole NHLFE should be > stored > in new struct rta_mpls (or rta_nhlfe), which would be extension of > struct rta (containing struct rta in the first field and NHLFE after > that). Such structure could be easily passed as struct rta and > functions > from rt-attr.c can work with that, with jome some minor modifications > (allocating, freeing and printing) dispatched based on dest field. > This rta could be used without changes also for MPLS routes. I'll try to send you patches for all these as I see it in several days.
Most of this are more or less trivial changes not MPLS-bound (VPNv4/6 can be used in case of bird used as RR in MPLS network, for example). Should I supply patches for these? What are your plans about commit routemap ? I create GIT branch 'mpls' and would merge these patches to that branch soon. When we will have some major release, we could merge 'mpls' branch to master if there is some sufficient usage (i think that even just static and kernel protocol support for MPLS would be a good example usage). Other protocols (LDP, ...) probably should be merged when they are reasonable ready. Will this branch available from official git repo ? It is not accessible (from its web interface at least).
Btw, some bird/LDP "status" report:
bird> show ldp neighbour Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0 TCP connection: 10.2.33.4.11212 - 0.0.0.0.0 State: Operational; Msgs sent/rcvd: 21/61; Downstream Up time: 00:02:27 LDP discovery sources: em1, Src IP addr: 10.1.5.4 Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0 TCP connection: 10.2.33.3.11009 - 0.0.0.0.0 State: Operational; Msgs sent/rcvd: 29/60; Downstream Up time: 00:02:20 LDP discovery sources: em2, Src IP addr: 10.1.6.3 bird> show ldp bindings lib entry: 10.2.0.0/30 local binding: label: 25 remote binding: lsr: 10.2.33.4:0, label: ImpNULL remote binding: lsr: 10.2.33.3:0, label: 23 lib entry: 10.1.6.0/24 remote binding: lsr: 10.2.33.3:0, label: ImpNULL remote binding: lsr: 10.2.33.4:0, label: 25 lib entry: 10.0.0.0/24 remote binding: lsr: 10.2.33.3:0, label: 19 remote binding: lsr: 10.2.33.4:0, label: 23 lib entry: 10.2.0.2/32 local binding: label: 26 remote binding: lsr: 10.2.33.4:0, label: 16 remote binding: lsr: 10.2.33.3:0, label: 24 lib entry: 10.1.4.0/24 local binding: label: 29 remote binding: lsr: 10.2.33.4:0, label: ImpNULL remote binding: lsr: 10.2.33.3:0, label: ImpNULL lib entry: 10.1.5.0/24 remote binding: lsr: 10.2.33.4:0, label: ImpNULL remote binding: lsr: 10.2.33.3:0, label: ImpNULL lib entry: 1.2.3.5/32 remote binding: lsr: 10.2.33.3:0, label: 20 remote binding: lsr: 10.2.33.4:0, label: 21 lib entry: 10.1.33.0/24 local binding: label: 28 remote binding: lsr: 10.2.33.4:0, label: ImpNULL remote binding: lsr: 10.2.33.3:0, label: ImpNULL lib entry: 10.2.33.3/32 local binding: label: 31 remote binding: lsr: 10.2.33.3:0, label: ImpNULL lib entry: 10.2.33.4/32 local binding: label: 27 remote binding: lsr: 10.2.33.4:0, label: ImpNULL remote binding: lsr: 10.2.33.3:0, label: 25 lib entry: 10.1.6.88/32 remote binding: lsr: 10.2.33.3:0, label: 18 remote binding: lsr: 10.2.33.4:0, label: 19 lib entry: 10.0.0.88/32 remote binding: lsr: 10.2.33.4:0, label: 17 remote binding: lsr: 10.2.33.3:0, label: 16 lib entry: 10.1.5.88/32 remote binding: lsr: 10.2.33.3:0, label: 21 remote binding: lsr: 10.2.33.4:0, label: 18 bird> show ldp forwardingtable Local Outgoing Prefix Bytes Label Outgoing Next Hop Label Label or VC or Tunnel Id Switched interface 20 SWAP 10.2.0.0/30 0 ? 10.1.5.4 21 SWAP 10.2.0.2/32 0 ? 10.1.5.4 22 SWAP 10.2.33.4/32 0 ? 10.1.5.4 23 SWAP 10.1.33.0/24 0 ? 10.1.5.4 24 SWAP 10.1.4.0/24 0 ? 10.1.5.4 25 SWAP 10.2.0.0/30 0 ? 10.1.5.4 26 SWAP 10.2.0.2/32 0 ? 10.1.5.4 27 SWAP 10.2.33.4/32 0 ? 10.1.5.4 28 SWAP 10.1.33.0/24 0 ? 10.1.5.4 29 SWAP 10.1.4.0/24 0 ? 10.1.5.4 30 SWAP 10.2.33.3/32 0 ? 10.1.6.3 31 SWAP 10.2.33.3/32 0 ? 10.1.6.3
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk41PJYACgkQwcJ4iSZ1q2l2ogCgnQKQ7yj+bqyZso3sKg+qy8Ob I3YAmQE239nJhkuGdNoHSeh3TZ/wnBxR =7hsl -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ondrej Zajicek wrote: > On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote: >>> Therefore there would be two types of routing tables - IP and MPLS. I >>> don't think it is a good idea to mix these. This may look inconsistent >>> with idea of embedding IPv4 to IPv6, but IP protocols are much more >>> similar, have a natural way to embed one in the other, have similar >>> roles and protocol structure. MPLS routing table could be used to LDP - >>> kernel interaction (routes imported from LDP and exported to kernel). >>> This solves your Case 2 without any hacks. I've tried to use this approach to add VPNv4|VPNv6 MP-BGP support. Unfortunately I can't see any benefits in this idea. I work with an IPv4 version (because using IPv6 version for IPv4 protocols will require me to make patches for all appropriate protocols based on non-reviewed fib patch which is another not even discussed task). As a result I'm stuck with with sizeof(ip_addr) == 4 and supporting IPv6 (at least not breaking existing support) within the same code in MP-BGP drives me crazy. Some of the following arguments are not valuable from an ipv6-only daemon slowly starting to support IPv4 approach, but still: merging IPv4 and IPv6 in single table is wrong IMHO. Most of bird network protocols support single family by design: rip, ospfv2, ospfv3. Those protocols doesn't require unified addresses/tables at all. For BGP there are no benefits, too: * ip_addr is not unified enough to support all MP-BGP families. (and it is not enough for kernel protocol, too) * IPv4 and IPv6 are handled completely different in BGP (BGP4 attributes vs MP-BGP attributes) * Next hop rta have to be altered or more complex logic for determining address family are required * It is much easier to use specific address prefixes for every address family instead of using this approach in general but have some exceptions * Generic rtable approach seems to be more complex: for some 'real' families tables are different, for some - not. * We have to add 'fake' families to rtable for the purpose of getting sizeof(address data) I've ended with rolling my own ip4_addr and ip6_addr for updating MP-BGP implementation. I see the following alternative solution for IPv4/IPv6 tables & stuff: * Use separate tables for IPv4 and IPv6 instead of unified one * Permit (internally) to create multiple rtables with the same name but different AF * Restrict users to do this * Config file definition 'table XXX' creates both IPv4 AND IPv6 rtables * Config file definition 'table XXX ipv4|ipv4' creates table for requested AF only. * Protocols with multiple AFs support (static, direct, kernel, BGP) declare this (maybe as supported protocols mask?) at the beginning and get connected to appropriate rtables - From user point of view nothing is changed. 'sh route' sorting problem gets away, too. Some fixes have to be done for filtering framework (af checking for every rule?) >> So, from user point of view, I define >> table xxx; for both ipv4 and IPv6 routes and >> mpls table yyy; for MPLS routing table? > > Yes. > >> There should be base MPLS rtable (mpls_default, for example) as in IP. >> We can also add a hack for automatically subscribe protocols for MPLS >> routing table by type and other attributes. For example, every LDP >> instance gets connected to an MPLS table (default or defined in config). >> Kernel protocol instance gets connected to MPLS table only if its IP >> table is the default one (GRT) or 'mpls table' keyword is supplied >> explicitely. What about VPNv4/VPNv6 ? The same approach? > > Perhaps even default MPLS table should be explicitly configured [*] (as i guess > not many BIRD users would use MPLS). Protocols requiring MPLS table would > fail if it is not configured, protocol with optional MPLS support (kernel, > static?) just do not connect to MPLS in that case. The same approach > for VPNvX table. > > [*] probably like: mpls table XXX default; > >> Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP >> / IPv4-mapped cases) > > I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for > similar purposes in IP stack. But this should not be checked directly > in protocols, there should be some macros in lib/ipv6.h for that. > >>> [*] when i wrote that i thought that labels are distributed just by LDP >>> and the purpose of label request is to propagate the label through LDP >>> area. i didn't noticed that BGP/MPLS also distributes labels so they >>> need to know assigned labels. So the idea would need some modifications. >> Not sure this will work. Since t1 is an IP table cases when we need to >> request specific label for: >> * AToM >> * RSVP-TE tunnels >> will not work since there are no prefixes that can be mapped to such >> request. > > You are probably right. I originally thought about some specific > 'request table' (where requests coded as routes with specific AF), > but perhaps there should be used some other mechanism / other protocol > hook. But it should be generic enough (some bus, allows at least more > 'producers' and perhaps more 'consumers'). > >>> Internal LMAP table is examined, tracked IGP table is examined. If both >>> are ready (for given prefix), appropriate encapsulating and MPLS routes >>> are generated and propagated using rte_update(), otherwise nothing is >>> generated and the previously generated route is withdrawn (rte_update() >>> with NULL is called) (or perhaps an unreachable route is generated if >>> LMAP is here but IGP route is missing). Simple and elegant. >> .. and in case of label release we should remove label only and keep >> original route > > Yes. > >>> There are some tricky parts of IGP tracking - it is problematic >>> to use standard RA_OPTIMAL update for this purpose, because if >>> generated encapsulating routes are imported to the same table, >>> these probably became the optimal ones and IGP routes would be >>> shaded. Solution would be to use RA_ANY, and ignore notifications >>> containing encapsulating routes, similarly 'examining the tracked >>> IGP table' means looking up the fib node and find the best route, >>> ignoring encapsulating ones. >>> >>> For implementation of this behavior, there are two minor changes that >>> needs to be done to the rt table code: First, currently accept_ra_types >>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a >>> property of an announce hook (as LDP would have two hooks with >>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for >>> both in rte_recalculate should be moved after the route list >>> is updated/relinked. > >> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a >> trivial task and requires internals understanding. Either announce type >> should be passed to announce hook or new hook should be added for RA_ANY >> event. The latter is more appropriate IMHO since RA_ANY is used by pipe >> protocol only. > > I thought about that when i created RA_ANY and have chosen this approach. > Probably best way is just to change rt_notify to have appropriate > struct announce_hook as a second argument instead of struct rtable. > struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly > some protocol-specific data. As (probably) all protocols are in-tree, > doing some wide but trivial changes is not a problem. > >> Kernel protocol should track RA_ANY protocol hooks >> looking for update source (LDP / RSVP) and re-install appropriate >> routes. > > I think kernel protocol should use RA_OPTIMAL as usual. This kind > of RA_ANY usage is for protocols that export routes to the same > table they listen (so 'source' routes would be shaded by their > routes). These routes (LDP / RSVP) should have just highest > priority. > >> The only downside is situation when LDP signalling starts faster >> than IGP. In that case we will get 3 updates instead of one (at least in >> RTSOCK): >> * RTM_ADD for original prefix >> * RTM_DEL for this prefix (as part of krt_set_notify()) >> * RTM_ADD for modified prefix >> >> RTM_CHANGE can be used in notify, but still: this gives 2 updates >> instead of one. > > No, because RA_ANY is handled strictly before RA_OPTIMAL and routes > are propagated synchronously depth-first: > > OSPF --RA_ANY--> LDP > LDP --RA_OPTIMAL--> kernel > OSPF --RA_OPTIMAL--> kernel > > But it is true that this is much dependent on internal implementation > of route propagation. The first idea i had was to use separate > tables for original and labeled routes (when just RA_OPTIMAL hooks), > but that looks too cumbersome for users and ability to push a better > route to the same (input) table has other possible usages. > >>> Therefore, it is probably a good idea to extend FIBs in a way you >>> suggested, with minor details changed. FIB / rtables would be uniform >>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6 >>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To >>> minimize code changes, struct fib_node would have ip_addr prefix, but >>> might be allocated larger. >> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large >> enough for holding IPv6 address? This can bump memory consumption for >> setups with several full-views significantly. > > It increases memory consumtion, but not so much in a relative view - for > each struct network there is at least one struct rte and in both of them > there is just one ip_addr and both structures are nontrivial. So this > relative increase would be about 1.15-1.2. Really big users would > probably keep current splitted setting. > >>> Because each protocol and each its announce_hook have appropriate role, >>> it is IMHO unnecessary to have AF in protocol hooks, but there should be >>> check whether protocol/announce_hook is connected to appropriate rtable. >>> >> To summarize required changes (please correct me): >> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly) >> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures: >> * rtable >> * fib >> * rte >> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field >> to struct fib to hold this value. >> 4) Move to memcmp() in fib_find / fib_get >> 5) Set up default rtable for every supported AF. Connect protocol >> instances to such default AFs based on protocol types > > 1a) other changes in rte_recalculate() related to propagation > (clean up the table before calling RA_ANY hook). > > 1) and 1a) i will do myself and send you the patch, and also make > some trivial example for exporting to the same table. > > 2) i am not sure if there is a reason to put explicit AF info > to struct fib, AF compatibility could be handled on higher level > (struct rtable in general, other direct users probably use just > one AF). > > 3) and hashing callback (and perhaps fib_route, but not sure if this is > needed). > > 4) probably encapsulate that to some static inline key_equal() function. > > 5) see my related note above. Protocol binding to tables should check AFs. > > more: > > 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail: > >>> i think encapsulation >>> routes should be represented by routes with new destination type >>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored >>> in new struct rta_mpls (or rta_nhlfe), which would be extension of >>> struct rta (containing struct rta in the first field and NHLFE after >>> that). Such structure could be easily passed as struct rta and functions >>> from rt-attr.c can work with that, with jome some minor modifications >>> (allocating, freeing and printing) dispatched based on dest field. > >>> This rta could be used without changes also for MPLS routes. > > >> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6 >> can be used in case of bird used as RR in MPLS network, for example). >> Should I supply patches for these? What are your plans about commit >> routemap ? > > I create GIT branch 'mpls' and would merge these patches to that branch > soon. When we will have some major release, we could merge 'mpls' branch > to master if there is some sufficient usage (i think that even just > static and kernel protocol support for MPLS would be a good example > usage). Other protocols (LDP, ...) probably should be merged when they > are reasonable ready. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4+becACgkQwcJ4iSZ1q2kKhwCfZyy8bQ8s8kzq8zmbMD1w2I6z eacAniMi+6YHkas0UQ+adO/QRewQL6fP =eXEr -----END PGP SIGNATURE-----
participants (4)
-
Alexander V. Chernikov -
Neil Wilson -
Ondrej Zajicek -
Tapio Haapala