Re: Merging bird and bird6

7 Aug 2011

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ondrej Zajicek wrote:
> On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
>>> Therefore there would be two types of routing tables - IP and MPLS. I
>>> don't think it is a good idea to mix these. This may look inconsistent
>>> with idea of embedding IPv4 to IPv6, but IP protocols are much more
>>> similar, have a natural way to embed one in the other, have similar
>>> roles and protocol structure. MPLS routing table could be used to LDP -
>>> kernel interaction (routes imported from LDP and exported to kernel).
>>> This solves your Case 2 without any hacks.
I've tried to use this approach to add VPNv4|VPNv6 MP-BGP support.

Unfortunately I can't see any benefits in this idea.
I work with an IPv4 version (because using IPv6 version for IPv4
protocols will require me to make patches for all appropriate protocols
based on non-reviewed fib patch which is another not even discussed task).
As a result I'm stuck with with sizeof(ip_addr) == 4 and supporting IPv6
(at least not breaking existing support) within the same code in MP-BGP
drives me crazy.

Some of the following arguments are not valuable from an ipv6-only
daemon slowly starting to support IPv4 approach, but still: merging IPv4
and IPv6 in single table is wrong IMHO.

Most of bird network protocols support single family by design: rip,
ospfv2, ospfv3. Those protocols doesn't require unified addresses/tables
at all.

For BGP there are no benefits, too:
* ip_addr is not unified enough to support all MP-BGP families. (and it
is not enough for kernel protocol, too)
* IPv4 and IPv6 are handled completely different in BGP (BGP4 attributes
vs MP-BGP attributes)
* Next hop rta have to be altered or more complex logic for determining
address family are required
* It is much easier to use specific address prefixes for every address
family instead of using this approach in general but have some exceptions
* Generic rtable approach seems to be more complex: for some 'real'
families tables are different, for some - not.
* We have to add 'fake' families to rtable for the purpose of getting
sizeof(address data)

I've ended with rolling my own ip4_addr and ip6_addr for updating MP-BGP
implementation.

I see the following alternative solution for IPv4/IPv6 tables & stuff:
* Use separate tables for IPv4 and IPv6 instead of unified one
* Permit (internally) to create multiple rtables with the same name but
different AF
* Restrict users to do this
* Config file definition 'table XXX' creates both IPv4 AND IPv6 rtables
* Config file definition 'table XXX ipv4|ipv4' creates table for
requested AF only.
* Protocols with multiple AFs support (static, direct, kernel, BGP)
declare this  (maybe as supported protocols mask?) at the beginning and
get connected to appropriate rtables

- From user point of view nothing is changed. 'sh route' sorting problem
gets away, too.

Some fixes have to be done for filtering framework (af checking for
every rule?)

>> So, from user point of view, I define
>> table xxx; for both ipv4 and IPv6 routes and
>> mpls table yyy; for MPLS routing table?
> 
> Yes.
> 
>> There should be base MPLS rtable (mpls_default, for example) as in IP.
>> We can also add a hack for automatically subscribe protocols for MPLS
>> routing table by type and other attributes. For example, every LDP
>> instance gets connected to an MPLS table (default or defined in config).
>> Kernel protocol instance gets connected to MPLS table only if its IP
>> table is the default one (GRT) or 'mpls table' keyword is supplied
>> explicitely. What about VPNv4/VPNv6 ? The same approach?
> 
> Perhaps even default MPLS table should be explicitly configured [*] (as i guess
> not many BIRD users would use MPLS). Protocols requiring MPLS table would
> fail if it is not configured, protocol with optional MPLS support (kernel,
> static?) just do not connect to MPLS in that case. The same approach
> for VPNvX table.
> 
> [*] probably like: mpls table XXX default;
> 
>>  Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP
>> / IPv4-mapped cases)
> 
> I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
> similar purposes in IP stack. But this should not be checked directly
> in protocols, there should be some macros in lib/ipv6.h for that.

> 
>>> [*] when i wrote that i thought that labels are distributed just by LDP
>>> and the purpose of label request is to propagate the label through LDP
>>> area. i didn't noticed that BGP/MPLS also distributes labels so they
>>> need to know assigned labels. So the idea would need some modifications.
>> Not sure this will work. Since t1 is an IP table cases when we need to
>> request specific label for:
>> * AToM
>> * RSVP-TE tunnels
>> will not work since there are no prefixes that can be mapped to such
>> request.
> 
> You are probably right. I originally thought about some specific
> 'request table' (where requests coded as routes with specific AF),
> but perhaps there should be used some other mechanism / other protocol
> hook. But it should be generic enough (some bus, allows at least more
> 'producers' and perhaps more 'consumers').
> 
>>> Internal LMAP table is examined, tracked IGP table is examined. If both
>>> are ready (for given prefix), appropriate encapsulating and MPLS routes
>>> are generated and propagated using rte_update(), otherwise nothing is
>>> generated and the previously generated route is withdrawn (rte_update()
>>> with NULL is called) (or perhaps an unreachable route is generated if
>>> LMAP is here but IGP route is missing). Simple and elegant.
>> .. and in case of label release we should remove label only and keep
>> original route
> 
> Yes.
> 
>>> There are some tricky parts of IGP tracking - it is problematic
>>> to use standard RA_OPTIMAL update for this purpose, because if
>>> generated encapsulating routes are imported to the same table,
>>> these probably became the optimal ones and IGP routes would be
>>> shaded. Solution would be to use RA_ANY, and ignore notifications
>>> containing encapsulating routes, similarly 'examining the tracked
>>> IGP table' means looking up the fib node and find the best route,
>>> ignoring encapsulating ones.
>>>
>>> For implementation of this behavior, there are two minor changes that
>>> needs to be done to the rt table code: First, currently accept_ra_types
>>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
>>> property of an announce hook (as LDP would have two hooks with
>>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for 
>>> both in rte_recalculate should be moved after the route list
>>> is updated/relinked.
> 
>> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
>> trivial task and requires internals understanding. Either announce type
>> should be passed to announce hook or new hook should be added for RA_ANY
>>  event. The latter is more appropriate IMHO since RA_ANY is used by pipe
>> protocol only.
> 
> I thought about that when i created RA_ANY and have chosen this approach.
> Probably best way is just to change rt_notify to have appropriate
> struct announce_hook as a second argument instead of struct rtable.
> struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
> some protocol-specific data. As (probably) all protocols are in-tree,
> doing some wide but trivial changes is not a problem.
> 
>> Kernel protocol should track RA_ANY protocol hooks
>> looking for update source (LDP / RSVP) and re-install appropriate
>> routes.
> 
> I think kernel protocol should use RA_OPTIMAL as usual. This kind
> of RA_ANY usage is for protocols that export routes to the same
> table they listen (so 'source' routes would be shaded by their
> routes). These routes (LDP / RSVP) should have just highest
> priority.
> 
>> The only downside is situation when LDP signalling starts faster
>> than IGP. In that case we will get 3 updates instead of one (at least in
>> RTSOCK):
>> * RTM_ADD for original prefix
>> * RTM_DEL for this prefix (as part of krt_set_notify())
>> * RTM_ADD for modified prefix
>>
>> RTM_CHANGE can be used in notify, but still: this gives 2 updates
>> instead of one.
> 
> No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
> are propagated synchronously depth-first:
> 
> OSPF --RA_ANY--> LDP
> LDP --RA_OPTIMAL--> kernel
> OSPF --RA_OPTIMAL--> kernel
> 
> But it is true that this is much dependent on internal implementation
> of route propagation. The first idea i had was to use separate
> tables for original and labeled routes (when just RA_OPTIMAL hooks),
> but that looks too cumbersome for users and ability to push a better
> route to the same (input) table has other possible usages.
> 
>>> Therefore, it is probably a good idea to extend FIBs in a way you
>>> suggested, with minor details changed. FIB / rtables would be uniform
>>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6
>>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
>>> minimize code changes, struct fib_node would have ip_addr prefix, but
>>> might be allocated larger. 
>> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
>> enough for holding IPv6 address? This can bump memory consumption for
>> setups with several full-views significantly.
> 
> It increases memory consumtion, but not so much in a relative view - for
> each struct network there is at least one struct rte and in both of them
> there is just one ip_addr and both structures are nontrivial. So this
> relative increase would be about 1.15-1.2. Really big users would
> probably keep current splitted setting.
> 
>>> Because each protocol and each its announce_hook have appropriate role,
>>> it is IMHO unnecessary to have AF in protocol hooks, but there should be
>>> check whether protocol/announce_hook is connected to appropriate rtable.
>>>
>> To summarize required changes (please correct me):
>> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
>> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
>> * rtable
>> * fib
>> * rte
>> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
>> to struct fib to hold this value.
>> 4) Move to memcmp() in fib_find / fib_get
>> 5) Set up default rtable for every supported AF. Connect protocol
>> instances to such default AFs based on protocol types
> 
> 1a) other changes in rte_recalculate() related to propagation
> (clean up the table before calling RA_ANY hook).
> 
> 1) and 1a) i will do myself and send you the patch, and also make
> some trivial example for exporting to the same table.
> 
> 2) i am not sure if there is a reason to put explicit AF info 
> to struct fib, AF compatibility could be handled on higher level
> (struct rtable in general, other direct users probably use just
> one AF).
> 
> 3) and hashing callback (and perhaps fib_route, but not sure if this is
> needed).
> 
> 4) probably encapsulate that to some static inline key_equal() function.
> 
> 5) see my related note above. Protocol binding to tables should check AFs.
> 
> more:
> 
> 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail:
> 
>>> i think encapsulation
>>> routes should be represented by routes with new destination type
>>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
>>> in new struct rta_mpls (or rta_nhlfe), which would be extension of
>>> struct rta (containing struct rta in the first field and NHLFE after
>>> that). Such structure could be easily passed as struct rta and functions
>>> from rt-attr.c can work with that, with jome some minor modifications
>>> (allocating, freeing and printing) dispatched based on dest field.
> 
>>> This rta could be used without changes also for MPLS routes.
> 
> 
>> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
>> can be used in case of bird used as RR in MPLS network, for example).
>> Should I supply patches for these? What are your plans about commit
>> routemap ?
> 
> I create GIT branch 'mpls' and would merge these patches to that branch
> soon. When we will have some major release, we could merge 'mpls' branch
> to master if there is some sufficient usage (i think that even just
> static and kernel protocol support for MPLS would be a good example
> usage). Other protocols (LDP, ...) probably should be merged when they
> are reasonable ready.
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4+becACgkQwcJ4iSZ1q2kKhwCfZyy8bQ8s8kzq8zmbMD1w2I6z
eacAniMi+6YHkas0UQ+adO/QRewQL6fP
=eXEr
-----END PGP SIGNATURE-----