Hello David, David Petera via Bird-users <bird-users@network.cz> writes:
have you made any revelations connected to this issue?
We have effectively turned off BFD on all BGP sessions as it seemed too random to be useful. It was working/not working on ... - iBGP session (short range, sub ms) - iBGP session (short range, sub ms, openwrt vs. alpine linux) - eBGP sessions (long range, 20-40ms latency)
Do I understand it correctly that the BGP sessions are up but the respective BFD sessions are down?
Correct.
I have encountered this kind of behavior when one side of the iBGP had been configured with 'direct' and the other was not, so maybe just double check if that is not the cause of the problem as well.
They are all direct in our case.
If not I would need additional information to be able to recreate it. What version of BIRD are you running? The previous BFD config might help if it was working as intended.
We are mainly running 2.15.1 in a container based on Alpine Linux. The openwrt version is actually the same: BIRD version 2.15.1
Also tracking down which socket is causing the errors would be very helpful (and if it happens to be Wireguard or such, the config and rough topology of it would be appreciated).
So I believe the socket issue stems from *multiple* bfd instances. This we solved in the end by having only *one* bfd {} block globally in all routers: -------------------------------------------------------------------------------- # Using BFD virtually everywhere, enable it globally protocol bfd { } ... bgp sessions following all the same pattern: protocol bgp ibgp_server141 { local as myas; neighbor 2a0a:e5c0:10:1::141 as myas; direct; bfd on; ipv6 { import filter both_default_routes; export filter static_and_bgp; gateway recursive; }; ipv4 { import filter both_default_routes; export filter static_and_bgp; gateway recursive; extended next hop on; }; } -------------------------------------------------------------------------------- We essentially generate the bgp block using helm charts and thus we can be sure that they are all the same (minus configured differences). Below is the example for ibgp that is probably most relevant: -------------------------------------------------------------------------------- {{- if $config.bird.iBGP.address }} {{- range $k, $v := $.Values.connections }} {{- if eq $v.bird.asn $config.bird.asn }} {{- if ne $k $.Values.localname }} {{- if $v.bird.iBGP.address }} protocol bgp ibgp_{{ $k }} { local as myas; neighbor {{ $v.bird.iBGP.address }} as myas; direct; ipv6 { {{- if $config.bird.iBGP.import_filter }} import filter {{ $config.bird.iBGP.import_filter }}; {{- else }} import all; {{- end }} {{- if $config.bird.iBGP.export_filter }} export filter {{ $config.bird.iBGP.export_filter }}; {{- else }} export filter static_and_bgp; {{- end }} gateway recursive; }; ipv4 { {{- if $config.bird.iBGP.import_filter }} import filter {{ $config.bird.iBGP.import_filter }}; {{- else }} import all; {{- end }} {{- if $config.bird.iBGP.export_filter }} export filter {{ $config.bird.iBGP.export_filter }}; {{- else }} export filter static_and_bgp; {{- end }} gateway recursive; extended next hop on; }; } -------------------------------------------------------------------------------- (note: the "bfd on;" block has been removed now). It's actually quite nice, we have one helm chart with one values.yaml that has a nice map of all connections of our AS'es in it. Automatic iBGP, manual eBGP. Best regards, Nico -- Sustainable and modern Infrastructures by ungleich.ch