BFD Intermittent Issues

Richard Laager rlaager at wiktel.com
Fri Jun 23 21:58:16 CEST 2023


MICE (IXP in Minneapolis, MN, USA) has had a couple reports of 
participants seeing BFD issues.

For the most recent example, the participant's router is sending BFD to 
the route server (confirmed via a packet capture on the route server), 
but BIRD simply did not respond. Both their IPv4 and IPv6 sessions to 
rs1 (the first route server) were acting this way, while both of their 
sessions to rs2 were fine. I compared the route servers and found two 
other inconsistencies: one network with IPv4 and one network with IPv6.

Here's what the packet capture shows:

    BFD Control message
       001. .... = Protocol Version: 1
       ...0 0011 = Diagnostic Code: Neighbor Signaled Session Down (0x03)
       01.. .... = Session State: Down (0x1)
       Message Flags: 0x48, Control Plane Independent: Set
         0... .. = Poll: Not set
         .0.. .. = Final: Not set
         ..1. .. = Control Plane Independent: Set
         ...0 .. = Authentication Present: Not set
         .... 0. = Demand: Not set
         .... .0 = Multipoint: Not set
       Detect Time Multiplier: 3 (= 6000 ms Detection time)
       Message Length: 24 bytes
       My Discriminator: 0x0000024d
       Your Discriminator: 0x0c764401
       Desired Min TX Interval: 2000 ms (2000000 us)
       Required Min RX Interval: 2000 ms (2000000 us)
       Required Min Echo Interval:    0 ms (0 us)


This may have started when rs1 was rebooted for patching, or it might 
have been due to a fiber cut that disconnected this participant from the 
fabric. We are not sure exactly on the timing.

The participant reset their BGP sessions. This caused the BFD session to 
come up.



We are running BIRD 2.0.8 from Ubuntu's repository on Ubuntu 22.04. Note 
that this BIRD setup is using the IXP Manager templates to generate the 
config, so there are separate BIRD processes for IPv4 vs IPv6.

A different participant noted they had seen BFD issues in the past. They 
noted this happened in March. That would have been the prior route 
servers, which were BIRD 1.x (I believe) on FreeBSD.

Note that BFD is configured as passive (i.e. the participant has to 
initiate it). Relevant config bits:

    protocol bfd
    {
             accept ipv4 direct;
             interface "en*" {
                     passive on;
                     multiplier 3;
                     min rx interval 500ms;
                     min tx interval 500ms;
             };
    }

    ...

    protocol bgp pb_0138_as18451 from tb_rsclient {
             description "AS18451 - LES.NET";
             neighbor 206.108.255.175 as 18451;
             ipv4 {
                 import limit 120 action restart;
                 import filter f_import_as18451;
                 table t_0138_as18451;
                 export filter f_export_as18451;
             };
             bfd on;
    }


Are there interesting fixes since 2.0.8? I looked at the git commit log 
and I see this (but it doesn't seem like it would apply):

    commit 99872676df45f1a490d3d63f43081afb41477040
    Author: Ondrej Zajicek <santiago at crfreenet.org>
    Date:   Sun Jan 22 23:42:08 2023 +0100

         BFD: Improve incoming packet matching

         For active sessions, ignore received packets with zero local id and
         mismatched remote id. That forces a session timeout instead of an
         immediate session restart. It makes BFD sessions more resilient to
         packet spoofing.

         Thanks to André Grüneberg for the suggestion.


The discussion was here:
https://lists.iphouse.net/cgi-bin/wa?A2=ind2306&L=MICE-DISCUSS&T=0&F=&S=&P=17401 
<https://lists.iphouse.net/cgi-bin/wa?A2=ind2306&L=MICE-DISCUSS&T=0&F=&S=&P=17401>

-- 
Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20230623/a5acc426/attachment.htm>


More information about the Bird-users mailing list