7 Aug
2011
7 Aug
'11
10:50 a.m.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ondrej Zajicek wrote: > On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote: >>> Therefore there would be two types of routing tables - IP and MPLS. I >>> don't think it is a good idea to mix these. This may look inconsistent >>> with idea of embedding IPv4 to IPv6, but IP protocols are much more >>> similar, have a natural way to embed one in the other, have similar >>> roles and protocol structure. MPLS routing table could be used to LDP - >>> kernel interaction (routes imported from LDP and exported to kernel). >>> This solves your Case 2 without any hacks. I've tried to use this approach to add VPNv4|VPNv6 MP-BGP support. Unfortunately I can't see any benefits in this idea. I work with an IPv4 version (because using IPv6 version for IPv4 protocols will require me to make patches for all appropriate protocols based on non-reviewed fib patch which is another not even discussed task). As a result I'm stuck with with sizeof(ip_addr) == 4 and supporting IPv6 (at least not breaking existing support) within the same code in MP-BGP drives me crazy. Some of the following arguments are not valuable from an ipv6-only daemon slowly starting to support IPv4 approach, but still: merging IPv4 and IPv6 in single table is wrong IMHO. Most of bird network protocols support single family by design: rip, ospfv2, ospfv3. Those protocols doesn't require unified addresses/tables at all. For BGP there are no benefits, too: * ip_addr is not unified enough to support all MP-BGP families. (and it is not enough for kernel protocol, too) * IPv4 and IPv6 are handled completely different in BGP (BGP4 attributes vs MP-BGP attributes) * Next hop rta have to be altered or more complex logic for determining address family are required * It is much easier to use specific address prefixes for every address family instead of using this approach in general but have some exceptions * Generic rtable approach seems to be more complex: for some 'real' families tables are different, for some - not. * We have to add 'fake' families to rtable for the purpose of getting sizeof(address data) I've ended with rolling my own ip4_addr and ip6_addr for updating MP-BGP implementation. I see the following alternative solution for IPv4/IPv6 tables & stuff: * Use separate tables for IPv4 and IPv6 instead of unified one * Permit (internally) to create multiple rtables with the same name but different AF * Restrict users to do this * Config file definition 'table XXX' creates both IPv4 AND IPv6 rtables * Config file definition 'table XXX ipv4|ipv4' creates table for requested AF only. * Protocols with multiple AFs support (static, direct, kernel, BGP) declare this (maybe as supported protocols mask?) at the beginning and get connected to appropriate rtables - From user point of view nothing is changed. 'sh route' sorting problem gets away, too. Some fixes have to be done for filtering framework (af checking for every rule?) >> So, from user point of view, I define >> table xxx; for both ipv4 and IPv6 routes and >> mpls table yyy; for MPLS routing table? > > Yes. > >> There should be base MPLS rtable (mpls_default, for example) as in IP. >> We can also add a hack for automatically subscribe protocols for MPLS >> routing table by type and other attributes. For example, every LDP >> instance gets connected to an MPLS table (default or defined in config). >> Kernel protocol instance gets connected to MPLS table only if its IP >> table is the default one (GRT) or 'mpls table' keyword is supplied >> explicitely. What about VPNv4/VPNv6 ? The same approach? > > Perhaps even default MPLS table should be explicitly configured [*] (as i guess > not many BIRD users would use MPLS). Protocols requiring MPLS table would > fail if it is not configured, protocol with optional MPLS support (kernel, > static?) just do not connect to MPLS in that case. The same approach > for VPNvX table. > > [*] probably like: mpls table XXX default; > >> Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP >> / IPv4-mapped cases) > > I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for > similar purposes in IP stack. But this should not be checked directly > in protocols, there should be some macros in lib/ipv6.h for that. > >>> [*] when i wrote that i thought that labels are distributed just by LDP >>> and the purpose of label request is to propagate the label through LDP >>> area. i didn't noticed that BGP/MPLS also distributes labels so they >>> need to know assigned labels. So the idea would need some modifications. >> Not sure this will work. Since t1 is an IP table cases when we need to >> request specific label for: >> * AToM >> * RSVP-TE tunnels >> will not work since there are no prefixes that can be mapped to such >> request. > > You are probably right. I originally thought about some specific > 'request table' (where requests coded as routes with specific AF), > but perhaps there should be used some other mechanism / other protocol > hook. But it should be generic enough (some bus, allows at least more > 'producers' and perhaps more 'consumers'). > >>> Internal LMAP table is examined, tracked IGP table is examined. If both >>> are ready (for given prefix), appropriate encapsulating and MPLS routes >>> are generated and propagated using rte_update(), otherwise nothing is >>> generated and the previously generated route is withdrawn (rte_update() >>> with NULL is called) (or perhaps an unreachable route is generated if >>> LMAP is here but IGP route is missing). Simple and elegant. >> .. and in case of label release we should remove label only and keep >> original route > > Yes. > >>> There are some tricky parts of IGP tracking - it is problematic >>> to use standard RA_OPTIMAL update for this purpose, because if >>> generated encapsulating routes are imported to the same table, >>> these probably became the optimal ones and IGP routes would be >>> shaded. Solution would be to use RA_ANY, and ignore notifications >>> containing encapsulating routes, similarly 'examining the tracked >>> IGP table' means looking up the fib node and find the best route, >>> ignoring encapsulating ones. >>> >>> For implementation of this behavior, there are two minor changes that >>> needs to be done to the rt table code: First, currently accept_ra_types >>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a >>> property of an announce hook (as LDP would have two hooks with >>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for >>> both in rte_recalculate should be moved after the route list >>> is updated/relinked. > >> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a >> trivial task and requires internals understanding. Either announce type >> should be passed to announce hook or new hook should be added for RA_ANY >> event. The latter is more appropriate IMHO since RA_ANY is used by pipe >> protocol only. > > I thought about that when i created RA_ANY and have chosen this approach. > Probably best way is just to change rt_notify to have appropriate > struct announce_hook as a second argument instead of struct rtable. > struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly > some protocol-specific data. As (probably) all protocols are in-tree, > doing some wide but trivial changes is not a problem. > >> Kernel protocol should track RA_ANY protocol hooks >> looking for update source (LDP / RSVP) and re-install appropriate >> routes. > > I think kernel protocol should use RA_OPTIMAL as usual. This kind > of RA_ANY usage is for protocols that export routes to the same > table they listen (so 'source' routes would be shaded by their > routes). These routes (LDP / RSVP) should have just highest > priority. > >> The only downside is situation when LDP signalling starts faster >> than IGP. In that case we will get 3 updates instead of one (at least in >> RTSOCK): >> * RTM_ADD for original prefix >> * RTM_DEL for this prefix (as part of krt_set_notify()) >> * RTM_ADD for modified prefix >> >> RTM_CHANGE can be used in notify, but still: this gives 2 updates >> instead of one. > > No, because RA_ANY is handled strictly before RA_OPTIMAL and routes > are propagated synchronously depth-first: > > OSPF --RA_ANY--> LDP > LDP --RA_OPTIMAL--> kernel > OSPF --RA_OPTIMAL--> kernel > > But it is true that this is much dependent on internal implementation > of route propagation. The first idea i had was to use separate > tables for original and labeled routes (when just RA_OPTIMAL hooks), > but that looks too cumbersome for users and ability to push a better > route to the same (input) table has other possible usages. > >>> Therefore, it is probably a good idea to extend FIBs in a way you >>> suggested, with minor details changed. FIB / rtables would be uniform >>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6 >>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To >>> minimize code changes, struct fib_node would have ip_addr prefix, but >>> might be allocated larger. >> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large >> enough for holding IPv6 address? This can bump memory consumption for >> setups with several full-views significantly. > > It increases memory consumtion, but not so much in a relative view - for > each struct network there is at least one struct rte and in both of them > there is just one ip_addr and both structures are nontrivial. So this > relative increase would be about 1.15-1.2. Really big users would > probably keep current splitted setting. > >>> Because each protocol and each its announce_hook have appropriate role, >>> it is IMHO unnecessary to have AF in protocol hooks, but there should be >>> check whether protocol/announce_hook is connected to appropriate rtable. >>> >> To summarize required changes (please correct me): >> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly) >> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures: >> * rtable >> * fib >> * rte >> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field >> to struct fib to hold this value. >> 4) Move to memcmp() in fib_find / fib_get >> 5) Set up default rtable for every supported AF. Connect protocol >> instances to such default AFs based on protocol types > > 1a) other changes in rte_recalculate() related to propagation > (clean up the table before calling RA_ANY hook). > > 1) and 1a) i will do myself and send you the patch, and also make > some trivial example for exporting to the same table. > > 2) i am not sure if there is a reason to put explicit AF info > to struct fib, AF compatibility could be handled on higher level > (struct rtable in general, other direct users probably use just > one AF). > > 3) and hashing callback (and perhaps fib_route, but not sure if this is > needed). > > 4) probably encapsulate that to some static inline key_equal() function. > > 5) see my related note above. Protocol binding to tables should check AFs. > > more: > > 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail: > >>> i think encapsulation >>> routes should be represented by routes with new destination type >>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored >>> in new struct rta_mpls (or rta_nhlfe), which would be extension of >>> struct rta (containing struct rta in the first field and NHLFE after >>> that). Such structure could be easily passed as struct rta and functions >>> from rt-attr.c can work with that, with jome some minor modifications >>> (allocating, freeing and printing) dispatched based on dest field. > >>> This rta could be used without changes also for MPLS routes. > > >> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6 >> can be used in case of bird used as RR in MPLS network, for example). >> Should I supply patches for these? What are your plans about commit >> routemap ? > > I create GIT branch 'mpls' and would merge these patches to that branch > soon. When we will have some major release, we could merge 'mpls' branch > to master if there is some sufficient usage (i think that even just > static and kernel protocol support for MPLS would be a good example > usage). Other protocols (LDP, ...) probably should be merged when they > are reasonable ready. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk4+becACgkQwcJ4iSZ1q2kKhwCfZyy8bQ8s8kzq8zmbMD1w2I6z eacAniMi+6YHkas0UQ+adO/QRewQL6fP =eXEr -----END PGP SIGNATURE-----