Bug? / Patch for BGP next hop issue with frr peers

Sebastian Hahn bird_users at sebastianhahn.net
Wed Apr 22 22:53:12 CEST 2020



> On 20. Apr 2020, at 10:06, Sebastian Hahn <bird_users at sebastianhahn.net> wrote:
>> On 20. Apr 2020, at 04:02, Darren O'Connor <mellow.drifter at gmail.com> wrote:
>> 
>> Is this an ipv6 route? nh length 32 means both a global and link-local address is being set. RFC2545 section 3
> 
> Yes, this is indeed an ipv6 route. I am using many IPv6 routes with other peers, they do not hit the same code path. Thank you for the RFC pointer!
> 
>> On Sun, 19 Apr 2020 at 21:39, Donald Sharp <sharpd at cumulusnetworks.com> wrote:
>> I cannot speak towards bird's behavior here but I can say that FRR has
>> fixed a couple of nexthop related issues with what we send to our
>> peers since the 6.0 release.  I would please consider upgrading to a
>> much later version if you can,  7.2 or 7.3 should have the fixes.
>> 
>> On Sun, Apr 19, 2020 at 6:48 PM Sebastian Hahn
>>> 
>>> In a bird 2.0.7 setup, I was unable to import routes from one of my peers. It is the only one using frr (version 6.02-2 on debian), most other peers use bird1 or bird2. I noticed immediately after adding the peering that I received "Invalid NEXT_HOP attribute" errors in the log. Unfortunately, no more information than that was provided in the log, so I went on a little hunt. Since the exact error is raised from many places, I added some logs to identify the GW_DIRECT case in bgp_apply_next_hop() as the culprit. Here I realized that both gw and ll are set, which means a neighbor is tried to be found which doesn't involve the ll case. I then noticed that in bgp_decode_next_hop_ip(), this can only happen if len == 32. This is where I am absolutely clueless what it means for the nh len to be 32, and thus I don't know if the patch I came up with is correct, even though it works for my testing. Only this one peer using frr causes nh len to be set to 32, so I suppose it might be !
>> a !
>>> rare configuration.
>>> 
>>> I added this simple patch:
>>> 
>>> --- bird2-2.0.7.orig/proto/bgp/packets.c
>>> +++ bird2-2.0.7/proto/bgp/packets.c
>>> @@ -1174,6 +1174,9 @@ bgp_decode_next_hop_ip(struct bgp_parse_
>>> 
>>>     if (ipa_is_ip4(nh[0]) || !ip6_is_link_local(nh[1]))
>>>       nh[1] = IPA_NONE;
>>> +
>>> +    if (ip6_is_link_local(nh[1]))
>>> +      nh[0] = IPA_NONE;
>>>   }
>>>   else
>>>     bgp_parse_error(s, 9);
>>> ~
>>> 
>>> which worked for me to resolve the problem.

Hi,

so I have an update here. My peer is now using an updated FRR, version 7.3-1~deb10u1 (installed from https://deb.frrouting.org), yet it still isn't working (in a normal bird 2.0.7, not using above patch). It seems to be FRR is advertising its link-local address twice as next-hop:

     /* GW_DIRECT -> single_hop -> p->neigh != NULL */
-    if (ipa_nonzero(gw))
+    if (ipa_nonzero(gw)) {
+      log(L_ERR "looking for neigh %I ll %I", gw, ll);
       nbr = neigh_find(&p->p, gw, NULL, 0);
+    }
     else if (ipa_nonzero(ll))

This small patch causes fe80::1 (my peer's link local address) to be logged for both gw and ll address. This seems like a problem with FRR/the peer's configuration to me, yes?
Nevertheless, I wonder if bird's behaviour is correct here. It seems to me from reading the pointed out RFC that in the face of two nexthop addresses, bird should attempt the connection using the specified link-local address, not the global address (basicall, replacing the order of the if ... else if above), while passing on the global address to its peers. My first attempt at a patch seems pretty wrong to me from a look at the RFC.

Thanks for any ideas :)

Cheers
Sebastian


More information about the Bird-users mailing list