BGP multipath support

Mon Jun 8 08:23:03 CEST 2015

I finally got round to running your latest dev version. One problem
that sticks out is that every RIB change results in a FIB change.

To see why this is a big deal, imagine you learn routes for network Y
via transit X, which you are connected to through router A and B:

 Y ---> X --> A ---> host
            |                   |
            ----> B -------

If Y prepends their BGP announcement, the update will not reach A and
B at the same time, and therefore the host will see the prepend from
each one after another. In the interim, it will be fooled into
thinking that there is a new best path, and modify the FIB to reflect
this, only to then receive the second update and have a tied route
again.

Running `ip monitor` you'll get to see a ton of route 'changes' which
don't look any different:

Deleted 184.51.158.0/24  proto bird
        nexthop via 130.94.A.A  dev p1p4 weight 1
        nexthop via 130.94.B.B  dev p6p1 weight 1
184.51.158.0/24  proto bird
        nexthop via 130.94.A.A  dev p1p4 weight 1
        nexthop via 130.94.B.B dev p6p1 weight 1

Even when the FIB should change, you end up with sequences of deletes
/ adds which mirror the order of BGP updates.

One way of solving this is to batch route changes by delaying route
injection, otherwise the route churn is too high and linux starts
doing a lot of nexthop invalidation when you inject multiple full
routing tables into the FIB.

Cheers,
- j

On Sun, Jun 7, 2015 at 6:04 PM, João Taveira Araújo
<joao.taveira at gmail.com> wrote:
> In our hack around this we (Fastly) ended up adding a bgp_rte_same
> with pretty much everything you mention.
>
> One non-obvious addition is that we ended up enforcing that the
> multipath entry had the same next AS, i.e bgp_get_neighbor(new) ==
> bgp_get_neighbor(old). With nothing else to tie break, we'd end up
> getting next hops towards the same prefix over different carriers. The
> problem with this is that with a high degree of route churn we'd get
> next hop invalidation, in which case a flow going over one carrier
> would flap onto another mid-flight, which had performance implications
> for users.
>
> We ended up enforcing this in our selection policy but it should
> arguably be optional (strict mode).
>
>
> On Sun, Jun 7, 2015 at 5:43 PM, Ondrej Zajicek <santiago at crfreenet.org> wrote:
>> On Fri, May 22, 2015 at 12:13:31PM +0200, Alexander Frolkin wrote:
>>> Hi Ondrej,
>>>
>>> > > I was wondering how hard it would be to add BGP multipath support to
>>> > > BIRD, or if anyone was working on it already?
>>> > BGP multipath is one thing we are currently working on.
>>>
>>> That's great news!  Do you know when it's likely to be available?
>>
>> Hi
>>
>> There is devel version of BGP multipath in our Git. Currently it allows
>> to merge routes that have the same preference, bgp_local_pref, bgp_path
>> length, bgp_origin, bgp_med (if relevant), ibgp/ebgp and igp_metric.
>>
>> As BGP multipath is non-standard, i wonder what kind of BGP multipath
>> behavior is expected by users and which options are necessary. I will
>> probably add some option to relax check for equal bgp_path length.
>>
>> --
>> Elen sila lumenn' omentielvo
>>
>> Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
>> OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
>> "To err is human -- to blame it on a computer is even more so."