All BFD sessions are handled on single thread?
Hi, I have the following configuration on one of my servers ("10.1.0.1"): *protocol bfd { interface "eth0" { min rx interval 200 ms; min tx interval 1000 ms; idle tx interval 1 s; multiplier 5; }; neighbor 10.1.0.2; neighbor 10.1.0.3; neighbor 10.1.0.4; ** ..... (here about 1000 records)* * neighbor 10.1.3.251;* * neighbor 10.1.3.252;* * neighbor 10.1.3.253;* * neighbor 10.1.3.254;* *}* And I configured the second server ("10.1.0.2") as a neighbor with the following config: *protocol bfd { interface "eth0" { min rx interval 200 ms; min tx interval 1000 ms; idle tx interval 1 s; multiplier 5; }; neighbor 10.1.0.1;}* For some reason I have random session expire events on the second server and BFD sessions get UP -> DOWN, DOWN -> UP events. *bird: bfd1: Session to 10.1.0.1 expiredbird: bfd1: Session to 10.1.0.1 changed state from Init to Downbird: bfd1: Session to 10.1.0.1 changed state from Down to Initbird: bfd1: Session to 10.1.0.1 expiredbird: bfd1: Session to 10.1.0.1 changed state from Init to Downbird: bfd1: Session to 10.1.0.1 changed state from Down to Init* This happens only if I specify a lot of neighbors in the config. For example the following config on first server ("10.1.0.1") works fine: *protocol bfd { interface "eth0" { min rx interval 200 ms; min tx interval 1000 ms; idle tx interval 1 s; multiplier 5; }; neighbor 10.1.0.2; neighbor 10.1.0.3;}* Looks like all BFD sessions are handled on a single thread. Could someone, please, confirm that BIRD isn't designed to handle a huge amount of BFD sessions simultaneously? Or possibly I can enable some options to handle this case in my env?
Hello, Currently, everything in the BIRD is working in the single thread. But they actively developing multithreading version now AFAIK: https://bird.network.cz/pipermail/bird-users/2021-March/015230.html On Mon, Apr 19, 2021 at 7:21 PM Sema Boyko <semenboyko@gmail.com> wrote:
Hi, I have the following configuration on one of my servers ("10.1.0.1"):
protocol bfd { interface "eth0" { min rx interval 200 ms; min tx interval 1000 ms; idle tx interval 1 s; multiplier 5; };
neighbor 10.1.0.2; neighbor 10.1.0.3; neighbor 10.1.0.4; ..... (here about 1000 records) neighbor 10.1.3.251; neighbor 10.1.3.252; neighbor 10.1.3.253; neighbor 10.1.3.254; }
And I configured the second server ("10.1.0.2") as a neighbor with the following config:
protocol bfd { interface "eth0" { min rx interval 200 ms; min tx interval 1000 ms; idle tx interval 1 s; multiplier 5; };
neighbor 10.1.0.1; }
For some reason I have random session expire events on the second server and BFD sessions get UP -> DOWN, DOWN -> UP events. bird: bfd1: Session to 10.1.0.1 expired bird: bfd1: Session to 10.1.0.1 changed state from Init to Down bird: bfd1: Session to 10.1.0.1 changed state from Down to Init bird: bfd1: Session to 10.1.0.1 expired bird: bfd1: Session to 10.1.0.1 changed state from Init to Down bird: bfd1: Session to 10.1.0.1 changed state from Down to Init
This happens only if I specify a lot of neighbors in the config. For example the following config on first server ("10.1.0.1") works fine:
protocol bfd { interface "eth0" { min rx interval 200 ms; min tx interval 1000 ms; idle tx interval 1 s; multiplier 5; };
neighbor 10.1.0.2; neighbor 10.1.0.3; }
Looks like all BFD sessions are handled on a single thread. Could someone, please, confirm that BIRD isn't designed to handle a huge amount of BFD sessions simultaneously? Or possibly I can enable some options to handle this case in my env?
On Mon, Apr 19, 2021 at 09:56:57PM +0200, Alexander Zubkov wrote:
Hello,
Currently, everything in the BIRD is working in the single thread.
Hello That is true for the rest of the BIRD, but the BFD protocol was always in a separate thread. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Great, thanks for the clarification. On Tue, Apr 20, 2021, 00:33 Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Mon, Apr 19, 2021 at 09:56:57PM +0200, Alexander Zubkov wrote:
Hello,
Currently, everything in the BIRD is working in the single thread.
Hello
That is true for the rest of the BIRD, but the BFD protocol was always in a separate thread.
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hello! On 4/19/21 7:15 PM, Sema Boyko wrote:
Hi, [...] This happens only if I specify a lot of neighbors in the config. For example the following config on first server ("10.1.0.1") works fine:
/protocol bfd { interface "eth0" { min rx interval 200 ms; min tx interval 1000 ms; idle tx interval 1 s; multiplier 5; };
neighbor 10.1.0.2; neighbor 10.1.0.3; }/
Looks like all BFD sessions are handled on a single thread. Could someone, please, confirm that BIRD isn't designed to handle a huge amount of BFD sessions simultaneously? Or possibly I can enable some options to handle this case in my env?
Yes, all BFD sessions are handled on a single thread. Luckily enough, this thread is separate and it isn'ŧ blocked by long tasks inside rest of BIRD. This will most probably change in some future versions, allowing for more BFD threads, yet I suppose that first other parts of BIRD will run in their dedicated threads to make route propagation faster. OTOH, we have also seen some hardware and kernel issues with massive usage of BFD, effectively not allowing the BFD packets to reach BIRD at all. Did you check it by tcpdump? Anyway, there may be also some problem with the hash size. Could you please rebuild BIRD with increased third argument of HASH_INIT() in bfd_start() in proto/bfd/bfd.c, lines 1029 and 1030? Let's say like this: diff --git a/proto/bfd/bfd.c b/proto/bfd/bfd.c index dac184c5..009b58a7 100644 --- a/proto/bfd/bfd.c +++ b/proto/bfd/bfd.c @@ -1026,8 +1026,8 @@ bfd_start(struct proto *P) pthread_spin_init(&p->lock, PTHREAD_PROCESS_PRIVATE); p->session_slab = sl_new(P->pool, sizeof(struct bfd_session)); - HASH_INIT(p->session_hash_id, P->pool, 8); - HASH_INIT(p->session_hash_ip, P->pool, 8); + HASH_INIT(p->session_hash_id, P->pool, 16); + HASH_INIT(p->session_hash_ip, P->pool, 16); init_list(&p->iface_list); This is a wild guess, anyway it may help you if the BFD thread is eating all the time looking up the sessions. Maria
participants (4)
-
Alexander Zubkov -
Maria Matejka -
Ondrej Zajicek -
Sema Boyko