MICE (IXP in Minneapolis, MN, USA) has
had a couple reports of participants seeing BFD issues.
For the most recent example, the
participant's router is sending BFD to the route server (confirmed
via a packet capture on the route server), but BIRD simply did not
respond. Both their IPv4 and IPv6 sessions to rs1 (the first route
server) were acting this way, while both of their sessions to rs2
were fine. I compared the route servers and found two other
inconsistencies: one network with IPv4 and one network with IPv6.
Here's what the packet capture shows:
BFD Control
message
001. .... =
Protocol Version: 1
...0 0011 =
Diagnostic Code: Neighbor Signaled Session Down (0x03)
01.. .... =
Session State: Down (0x1)
Message
Flags: 0x48, Control Plane Independent: Set
0... .. =
Poll: Not set
.0.. .. =
Final: Not set
..1. .. =
Control Plane Independent: Set
...0 .. =
Authentication Present: Not set
.... 0. =
Demand: Not set
.... .0 =
Multipoint: Not set
Detect Time
Multiplier: 3 (= 6000 ms Detection time)
Message
Length: 24 bytes
My
Discriminator: 0x0000024d
Your
Discriminator: 0x0c764401
Desired Min
TX Interval: 2000 ms (2000000 us)
Required Min
RX Interval: 2000 ms (2000000 us)
Required Min
Echo Interval: 0 ms (0 us)
This may have started when rs1 was rebooted for patching, or it
might have been due to a fiber cut that disconnected this
participant from the fabric. We are not sure exactly on the timing.
The participant reset their BGP
sessions. This caused the BFD session to come up.
We are running BIRD 2.0.8 from Ubuntu's
repository on Ubuntu 22.04. Note that this BIRD setup is using the
IXP Manager templates to generate the config, so there are
separate BIRD processes for IPv4 vs IPv6.
A different participant noted they had
seen BFD issues in the past. They noted this happened in March.
That would have been the prior route servers, which were BIRD 1.x
(I believe) on FreeBSD.
Note that BFD is configured as passive
(i.e. the participant has to initiate it). Relevant config bits:
protocol bfd
{
accept ipv4 direct;
interface "en*" {
passive on;
multiplier 3;
min rx interval 500ms;
min tx interval 500ms;
};
}
...
protocol bgp pb_0138_as18451 from
tb_rsclient {
description "AS18451 - LES.NET";
neighbor 206.108.255.175 as 18451;
ipv4 {
import limit 120 action
restart;
import filter f_import_as18451;
table t_0138_as18451;
export filter f_export_as18451;
};
bfd on;
}
Are there interesting fixes since
2.0.8? I looked at the git commit log and I see this (but it
doesn't seem like it would apply):
commit
99872676df45f1a490d3d63f43081afb41477040
Author: Ondrej Zajicek
<santiago@crfreenet.org>
Date: Sun Jan 22 23:42:08 2023 +0100
BFD: Improve incoming packet matching
For active sessions, ignore received
packets with zero local id and
mismatched remote id. That forces a
session timeout instead of an
immediate session restart. It makes BFD
sessions more resilient to
packet spoofing.
Thanks to André Grüneberg for the
suggestion.
The discussion was here:
--
Richard