first (?) bird 3.0.0 bug report
Hi, Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated. If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes. I can reproduce it by just kill -9 bird && restarting bird. daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0 kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1) kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b The kernel protocol config is: protocol kernel KERNEL6 { debug { events, states }; scan time 3600; merge paths on; metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; } Cheers, R
Hello, Radu! We apparently missed this case in our test scenarios, so we'll add one and check if it reproduces. We'll ask for more info if we need it. Congratulations to the first BIRD 3 bugreport and thanks for it! Maria On 18 December 2024 14:27:44 CET, Radu Anghel via Bird-users <bird-users@network.cz> wrote:
Hi,
Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated.
If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes.
I can reproduce it by just kill -9 bird && restarting bird.
daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0
kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1)
kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b
The kernel protocol config is:
protocol kernel KERNEL6 { debug { events, states }; scan time 3600;
merge paths on;
metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; }
Cheers,
R
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hello Radu, sorry, this was a stupid omission of a null check. Fixed in [b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311](https://gitlab.nic.cz/labs/bird/-/tree/b6caccfd45fb639b6dd3a8d140d3c5ba4cc79...). Could you please check that it works on your side now? Thanks, Maria On Wed, Dec 18, 2024 at 02:55:19PM +0100, Maria Matejka via Bird-users wrote:
We apparently missed this case in our test scenarios, so we'll add one and check if it reproduces. We'll ask for more info if we need it.
Congratulations to the first BIRD 3 bugreport and thanks for it!
On 18 December 2024 14:27:44 CET, Radu Anghel via Bird-users <bird-users@network.cz> wrote:
Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated.
If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes.
I can reproduce it by just kill -9 bird && restarting bird.
daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0
kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1)
kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b
The kernel protocol config is: ``` protocol kernel KERNEL6 { debug { events, states }; scan time 3600;
merge paths on;
metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; } ```
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hi Maria, I tried the patch and I can confirm that it is now safe to kill and revive the bird :) Thank you! Best, Radu On 19.12.2024 13:38, Maria Matejka wrote:
Hello Radu,
sorry, this was a stupid omission of a null check. Fixed in b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311 <https://gitlab.nic.cz/labs/ bird/-/tree/b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311>. Could you please check that it works on your side now?
Thanks, Maria
On Wed, Dec 18, 2024 at 02:55:19PM +0100, Maria Matejka via Bird-users wrote:
We apparently missed this case in our test scenarios, so we’ll add one and check if it reproduces. We’ll ask for more info if we need it.
Congratulations to the first BIRD 3 bugreport and thanks for it!
On 18 December 2024 14:27:44 CET, Radu Anghel via Bird-users bird- users@network.cz <mailto:bird-users@network.cz> wrote:
Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated.
If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes.
I can reproduce it by just kill -9 bird && restarting bird.
daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0
kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1)
kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b
The kernel protocol config is:
|protocol kernel KERNEL6 { debug { events, states }; scan time 3600; merge paths on; metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; }|
– Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hello Maria, I thought that I had the same bug, but it seems that it’s a different one as I applied the patch but I still got a segfault at startup. [342115.227497] bird[19396]: segfault at 8 ip 00005642c37062db sp 00007fc784295ca0 error 4 in bird[5642c366e000+f0000] [342115.227524] Code: 39 10 0f 85 e3 8c f6 ff 48 8b 90 b0 01 00 00 8b 8a 10 01 00 00 89 4c 24 24 85 c9 0f 84 d6 00 00 00 4d 85 ff 0f 84 e0 00 00 00 <41> 0f b6 44 24 08 89 c2 83 e2 01 4d 85 ed 75 15 83 c8 01 41 88 44 I can generate a coredump and share my configuration if needed. Alarig On Thu 19 Dec 2024 16:14:18 GMT, Radu Anghel via Bird-users wrote:
Hi Maria,
I tried the patch and I can confirm that it is now safe to kill and revive the bird :)
Thank you!
Best,
Radu
On 19.12.2024 13:38, Maria Matejka wrote:
Hello Radu,
sorry, this was a stupid omission of a null check. Fixed in b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311 <https://gitlab.nic.cz/labs/ bird/-/tree/b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311>. Could you please check that it works on your side now?
Thanks, Maria
On Wed, Dec 18, 2024 at 02:55:19PM +0100, Maria Matejka via Bird-users wrote:
We apparently missed this case in our test scenarios, so we’ll add one and check if it reproduces. We’ll ask for more info if we need it.
Congratulations to the first BIRD 3 bugreport and thanks for it!
On 18 December 2024 14:27:44 CET, Radu Anghel via Bird-users bird- users@network.cz <mailto:bird-users@network.cz> wrote:
Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated.
If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes.
I can reproduce it by just kill -9 bird && restarting bird.
daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0
kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1)
kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b
The kernel protocol config is:
|protocol kernel KERNEL6 { debug { events, states }; scan time 3600; merge paths on; metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; }|
– Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hello Alarig, thanks for reporting! The coredump and config would be much appreciated, so I can try to reproduce it. Also would you say there is something unconventional about the said setup that might be causing the problem? Thanks in advance and happy new year, David On 12/19/24 19:16, Alarig Le Lay via Bird-users wrote:
Hello Maria,
I thought that I had the same bug, but it seems that it’s a different one as I applied the patch but I still got a segfault at startup.
[342115.227497] bird[19396]: segfault at 8 ip 00005642c37062db sp 00007fc784295ca0 error 4 in bird[5642c366e000+f0000] [342115.227524] Code: 39 10 0f 85 e3 8c f6 ff 48 8b 90 b0 01 00 00 8b 8a 10 01 00 00 89 4c 24 24 85 c9 0f 84 d6 00 00 00 4d 85 ff 0f 84 e0 00 00 00 <41> 0f b6 44 24 08 89 c2 83 e2 01 4d 85 ed 75 15 83 c8 01 41 88 44
I can generate a coredump and share my configuration if needed.
Alarig
On Thu 19 Dec 2024 16:14:18 GMT, Radu Anghel via Bird-users wrote:
Hi Maria,
I tried the patch and I can confirm that it is now safe to kill and revive the bird :)
Thank you!
Best,
Radu
On 19.12.2024 13:38, Maria Matejka wrote:
Hello Radu,
sorry, this was a stupid omission of a null check. Fixed in b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311 <https://gitlab.nic.cz/labs/ bird/-/tree/b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311>. Could you please check that it works on your side now?
Thanks, Maria
On Wed, Dec 18, 2024 at 02:55:19PM +0100, Maria Matejka via Bird-users wrote:
We apparently missed this case in our test scenarios, so we’ll add one and check if it reproduces. We’ll ask for more info if we need it.
Congratulations to the first BIRD 3 bugreport and thanks for it!
On 18 December 2024 14:27:44 CET, Radu Anghel via Bird-users bird- users@network.cz <mailto:bird-users@network.cz> wrote:
Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated.
If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes.
I can reproduce it by just kill -9 bird && restarting bird.
daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0
kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1)
kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b
The kernel protocol config is:
|protocol kernel KERNEL6 { debug { events, states }; scan time 3600; merge paths on; metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; }|
– Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
-- David Petera (he/him) | BIRD Tech Support | CZ.NIC, z.s.p.o.
Hello David, Here is the coredump: https://herbizarre.swordarmor.fr/garbage/core-bird.29074.rr3.swordarmor.fr.1... And here is the config: https://herbizarre.swordarmor.fr/garbage/bird.conf.29074.rr3.swordarmor.fr.1... I compiled bird3 with the two previous patches of this ML: * Applying bird-3.0.0-nest-rt-table.c.patch ... * Applying bird-3.0.0-proto-lock.patch ... And I enabled the debug: ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --docdir=/usr/share/doc/bird-3.0.0 --htmldir=/usr/share/doc/bird-3.0.0/html --libdir=/usr/lib64 --localstatedir=/var --enable-client --enable-debug --disable-libssh So I have a different trace in dmesg (I supposed that it’s due to having debug symbols): [1305799.952226] bird[29075]: segfault at 8 ip 000055861e8cb989 sp 00007f13ab6e0b40 error 4 in bird[55861e7ed000+15c000] [1305799.952251] Code: 89 c7 e8 13 80 ff ff 89 85 6c ff ff ff 48 8b 45 c0 48 89 c7 e8 2d f3 ff ff 89 85 70 ff ff ff 48 83 7d b0 00 74 18 48 8b 45 b0 <0f> b6 00 0f b6 c0 83 e0 01 85 c0 74 07 b8 01 00 00 00 eb 05 b8 00 Regarding the setup, I don’t think that it’s that unconventional, but it’s a RR outside of the network (it’s on my own network which is pretty much experimental, so I experiment). I noticed that I never removed the OSPF from the kernel filters until now, but even with 'export none' I still get the crash. So I then tried to enable all the debug options: rr3 ~ # grep debug /etc/bird.conf debug protocols all; debug channels all; debug tables all; And here are the logs: https://paste.swordarmor.fr/raw/Owej I hope it’s useful, don’t hesitate to ask me if you need anything else. Happy new year to you too! Alarig On Mon 30 Dec 2024 16:49:41 GMT, David Petera wrote:
Hello Alarig,
thanks for reporting!
The coredump and config would be much appreciated, so I can try to reproduce it. Also would you say there is something unconventional about the said setup that might be causing the problem?
Thanks in advance and happy new year,
David
On 12/19/24 19:16, Alarig Le Lay via Bird-users wrote:
Hello Maria,
I thought that I had the same bug, but it seems that it’s a different one as I applied the patch but I still got a segfault at startup.
[342115.227497] bird[19396]: segfault at 8 ip 00005642c37062db sp 00007fc784295ca0 error 4 in bird[5642c366e000+f0000] [342115.227524] Code: 39 10 0f 85 e3 8c f6 ff 48 8b 90 b0 01 00 00 8b 8a 10 01 00 00 89 4c 24 24 85 c9 0f 84 d6 00 00 00 4d 85 ff 0f 84 e0 00 00 00 <41> 0f b6 44 24 08 89 c2 83 e2 01 4d 85 ed 75 15 83 c8 01 41 88 44
I can generate a coredump and share my configuration if needed.
Alarig
On Thu 19 Dec 2024 16:14:18 GMT, Radu Anghel via Bird-users wrote:
Hi Maria,
I tried the patch and I can confirm that it is now safe to kill and revive the bird :)
Thank you!
Best,
Radu
On 19.12.2024 13:38, Maria Matejka wrote:
Hello Radu,
sorry, this was a stupid omission of a null check. Fixed in b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311 <https://gitlab.nic.cz/labs/ bird/-/tree/b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311>. Could you please check that it works on your side now?
Thanks, Maria
On Wed, Dec 18, 2024 at 02:55:19PM +0100, Maria Matejka via Bird-users wrote:
We apparently missed this case in our test scenarios, so we’ll add one and check if it reproduces. We’ll ask for more info if we need it.
Congratulations to the first BIRD 3 bugreport and thanks for it!
On 18 December 2024 14:27:44 CET, Radu Anghel via Bird-users bird- users@network.cz <mailto:bird-users@network.cz> wrote:
Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated.
If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes.
I can reproduce it by just kill -9 bird && restarting bird.
daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0
kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1)
kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b
The kernel protocol config is:
|protocol kernel KERNEL6 { debug { events, states }; scan time 3600; merge paths on; metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; }|
– Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
-- David Petera (he/him) | BIRD Tech Support | CZ.NIC, z.s.p.o.
I read the coredump (and the source code) with a fresh cup of coffee, and acted accordingly: The error is there: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055861e8cb989 in bgp_rte_recalculate (table=0x55861fe0a250, net=0x55861fe1b320, new_stored=0x7f13aaedd070, old_stored=0x0, old_best_stored=0x0) at proto/bgp/attrs.c:2696 Which corresponds to the deterministic med: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... int old_suppressed = old ? !!(old->pflags & BGP_REF_SUPPRESSED) : 0; To be used there: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... n = new->pflags & BGP_REF_SUPPRESSED; o = old->pflags & BGP_REF_SUPPRESSED; So I commented it and now bird starts: rr3 ~ # grep med /etc/bird.conf med metric on; #deterministic med on; med metric on; #deterministic med on; On Mon 30 Dec 2024 22:35:02 GMT, Alarig Le Lay wrote:
Hello David,
Here is the coredump: https://herbizarre.swordarmor.fr/garbage/core-bird.29074.rr3.swordarmor.fr.1... And here is the config: https://herbizarre.swordarmor.fr/garbage/bird.conf.29074.rr3.swordarmor.fr.1...
I compiled bird3 with the two previous patches of this ML:
* Applying bird-3.0.0-nest-rt-table.c.patch ... * Applying bird-3.0.0-proto-lock.patch ...
And I enabled the debug:
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --docdir=/usr/share/doc/bird-3.0.0 --htmldir=/usr/share/doc/bird-3.0.0/html --libdir=/usr/lib64 --localstatedir=/var --enable-client --enable-debug --disable-libssh
So I have a different trace in dmesg (I supposed that it’s due to having debug symbols): [1305799.952226] bird[29075]: segfault at 8 ip 000055861e8cb989 sp 00007f13ab6e0b40 error 4 in bird[55861e7ed000+15c000] [1305799.952251] Code: 89 c7 e8 13 80 ff ff 89 85 6c ff ff ff 48 8b 45 c0 48 89 c7 e8 2d f3 ff ff 89 85 70 ff ff ff 48 83 7d b0 00 74 18 48 8b 45 b0 <0f> b6 00 0f b6 c0 83 e0 01 85 c0 74 07 b8 01 00 00 00 eb 05 b8 00
Regarding the setup, I don’t think that it’s that unconventional, but it’s a RR outside of the network (it’s on my own network which is pretty much experimental, so I experiment). I noticed that I never removed the OSPF from the kernel filters until now, but even with 'export none' I still get the crash.
So I then tried to enable all the debug options: rr3 ~ # grep debug /etc/bird.conf debug protocols all; debug channels all; debug tables all;
And here are the logs: https://paste.swordarmor.fr/raw/Owej
I hope it’s useful, don’t hesitate to ask me if you need anything else.
Happy new year to you too! Alarig
On Mon 30 Dec 2024 16:49:41 GMT, David Petera wrote:
Hello Alarig,
thanks for reporting!
The coredump and config would be much appreciated, so I can try to reproduce it. Also would you say there is something unconventional about the said setup that might be causing the problem?
Thanks in advance and happy new year,
David
On 12/19/24 19:16, Alarig Le Lay via Bird-users wrote:
Hello Maria,
I thought that I had the same bug, but it seems that it’s a different one as I applied the patch but I still got a segfault at startup.
[342115.227497] bird[19396]: segfault at 8 ip 00005642c37062db sp 00007fc784295ca0 error 4 in bird[5642c366e000+f0000] [342115.227524] Code: 39 10 0f 85 e3 8c f6 ff 48 8b 90 b0 01 00 00 8b 8a 10 01 00 00 89 4c 24 24 85 c9 0f 84 d6 00 00 00 4d 85 ff 0f 84 e0 00 00 00 <41> 0f b6 44 24 08 89 c2 83 e2 01 4d 85 ed 75 15 83 c8 01 41 88 44
I can generate a coredump and share my configuration if needed.
Alarig
On Thu 19 Dec 2024 16:14:18 GMT, Radu Anghel via Bird-users wrote:
Hi Maria,
I tried the patch and I can confirm that it is now safe to kill and revive the bird :)
Thank you!
Best,
Radu
On 19.12.2024 13:38, Maria Matejka wrote:
Hello Radu,
sorry, this was a stupid omission of a null check. Fixed in b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311 <https://gitlab.nic.cz/labs/ bird/-/tree/b6caccfd45fb639b6dd3a8d140d3c5ba4cc79311>. Could you please check that it works on your side now?
Thanks, Maria
On Wed, Dec 18, 2024 at 02:55:19PM +0100, Maria Matejka via Bird-users wrote:
We apparently missed this case in our test scenarios, so we’ll add one and check if it reproduces. We’ll ask for more info if we need it.
Congratulations to the first BIRD 3 bugreport and thanks for it!
On 18 December 2024 14:27:44 CET, Radu Anghel via Bird-users bird- users@network.cz <mailto:bird-users@network.cz> wrote:
Using a slightly modified version of the config from the 2.15.1 (just some {} inside case structures), but probably unrelated.
If bird gets killed/not shut down properly and routes remain in the kernel then bird 3.0.0 is unable to restart and just segfaults when trying to refresh kernel routes.
I can reproduce it by just kill -9 bird && restarting bird.
daemon.debug bird: KERNEL6.ipv6: route refresh begin: rr 1 set 1 valid 0 pruning 0 pruned 0
kern.info kernel: [75645.964935] bird[4162]: segfault at 0 ip 000000000045d47b sp 00007ffcb9f9f2c0 error 4 in bird[403000+a8000] likely on CPU 1 (core 0, socket 1)
kern.info kernel: [75645.970115] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 55 53 48 89 fb 48 83 ec 38 80 bb 79 02 00 00 04 48 8b 7f 30 75 46 48 89 d5 31 d2 e8 63 a4 ff ff <83> 38 00 49 89 c0 75 07 31 c0 e9 9d 00 00 00 48 8b 40 10 48 8d 7b
The kernel protocol config is:
|protocol kernel KERNEL6 { debug { events, states }; scan time 3600; merge paths on; metric 0; ipv6 { import filter KERNEL_IN; export filter KERNEL_OUT; }; }|
– Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
-- David Petera (he/him) | BIRD Tech Support | CZ.NIC, z.s.p.o.
Hello Alarig, this was indeed a problem (and there was some more to fix) and if I haven't missed anything (but it now looks OK for me), here is a fixed version: <https://gitlab.nic.cz/labs/bird/-/commit/2e14832d36c83b2ab5b7fb28b701de554fa5fdd9> If you could check that it works for you as well, it would be very helpful. Thank you for your report! Maria On Tue, Dec 31, 2024 at 10:05:12AM +0100, Alarig Le Lay via Bird-users wrote:
I read the coredump (and the source code) with a fresh cup of coffee, and acted accordingly:
The error is there: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055861e8cb989 in bgp_rte_recalculate (table=0x55861fe0a250, net=0x55861fe1b320, new_stored=0x7f13aaedd070, old_stored=0x0, old_best_stored=0x0) at proto/bgp/attrs.c:2696
Which corresponds to the deterministic med: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... int old_suppressed = old ? !!(old->pflags & BGP_REF_SUPPRESSED) : 0;
To be used there: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... n = new->pflags & BGP_REF_SUPPRESSED; o = old->pflags & BGP_REF_SUPPRESSED;
So I commented it and now bird starts: rr3 ~ # grep med /etc/bird.conf med metric on; #deterministic med on; med metric on; #deterministic med on;
On Mon 30 Dec 2024 22:35:02 GMT, Alarig Le Lay wrote:
Hello David,
Here is the coredump: https://herbizarre.swordarmor.fr/garbage/core-bird.29074.rr3.swordarmor.fr.1... And here is the config: https://herbizarre.swordarmor.fr/garbage/bird.conf.29074.rr3.swordarmor.fr.1...
I compiled bird3 with the two previous patches of this ML:
* Applying bird-3.0.0-nest-rt-table.c.patch ... * Applying bird-3.0.0-proto-lock.patch ...
And I enabled the debug:
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --docdir=/usr/share/doc/bird-3.0.0 --htmldir=/usr/share/doc/bird-3.0.0/html --libdir=/usr/lib64 --localstatedir=/var --enable-client --enable-debug --disable-libssh
So I have a different trace in dmesg (I supposed that it’s due to having debug symbols): [1305799.952226] bird[29075]: segfault at 8 ip 000055861e8cb989 sp 00007f13ab6e0b40 error 4 in bird[55861e7ed000+15c000] [1305799.952251] Code: 89 c7 e8 13 80 ff ff 89 85 6c ff ff ff 48 8b 45 c0 48 89 c7 e8 2d f3 ff ff 89 85 70 ff ff ff 48 83 7d b0 00 74 18 48 8b 45 b0 <0f> b6 00 0f b6 c0 83 e0 01 85 c0 74 07 b8 01 00 00 00 eb 05 b8 00
Regarding the setup, I don’t think that it’s that unconventional, but it’s a RR outside of the network (it’s on my own network which is pretty much experimental, so I experiment). I noticed that I never removed the OSPF from the kernel filters until now, but even with 'export none' I still get the crash.
So I then tried to enable all the debug options: rr3 ~ # grep debug /etc/bird.conf debug protocols all; debug channels all; debug tables all;
And here are the logs: https://paste.swordarmor.fr/raw/Owej
I hope it’s useful, don’t hesitate to ask me if you need anything else.
Happy new year to you too! Alarig
On Mon 30 Dec 2024 16:49:41 GMT, David Petera wrote:
Hello Alarig,
thanks for reporting!
The coredump and config would be much appreciated, so I can try to reproduce it. Also would you say there is something unconventional about the said setup that might be causing the problem?
Thanks in advance and happy new year,
David
On 12/19/24 19:16, Alarig Le Lay via Bird-users wrote:
Hello Maria,
I thought that I had the same bug, but it seems that it’s a different one as I applied the patch but I still got a segfault at startup.
[342115.227497] bird[19396]: segfault at 8 ip 00005642c37062db sp 00007fc784295ca0 error 4 in bird[5642c366e000+f0000] [342115.227524] Code: 39 10 0f 85 e3 8c f6 ff 48 8b 90 b0 01 00 00 8b 8a 10 01 00 00 89 4c 24 24 85 c9 0f 84 d6 00 00 00 4d 85 ff 0f 84 e0 00 00 00 <41> 0f b6 44 24 08 89 c2 83 e2 01 4d 85 ed 75 15 83 c8 01 41 88 44
I can generate a coredump and share my configuration if needed.
Alarig
[...]
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hello Maria, Thanks a lot, I confirm that it works well for me now :) On Tue 07 Jan 2025 22:06:31 GMT, Maria Matejka via Bird-users wrote:
Hello Alarig,
this was indeed a problem (and there was some more to fix) and if I haven't missed anything (but it now looks OK for me), here is a fixed version: <https://gitlab.nic.cz/labs/bird/-/commit/2e14832d36c83b2ab5b7fb28b701de554fa5fdd9>
If you could check that it works for you as well, it would be very helpful.
Thank you for your report! Maria
On Tue, Dec 31, 2024 at 10:05:12AM +0100, Alarig Le Lay via Bird-users wrote:
I read the coredump (and the source code) with a fresh cup of coffee, and acted accordingly:
The error is there: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055861e8cb989 in bgp_rte_recalculate (table=0x55861fe0a250, net=0x55861fe1b320, new_stored=0x7f13aaedd070, old_stored=0x0, old_best_stored=0x0) at proto/bgp/attrs.c:2696
Which corresponds to the deterministic med: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... int old_suppressed = old ? !!(old->pflags & BGP_REF_SUPPRESSED) : 0;
To be used there: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... n = new->pflags & BGP_REF_SUPPRESSED; o = old->pflags & BGP_REF_SUPPRESSED;
So I commented it and now bird starts: rr3 ~ # grep med /etc/bird.conf med metric on; #deterministic med on; med metric on; #deterministic med on;
On Mon 30 Dec 2024 22:35:02 GMT, Alarig Le Lay wrote:
Hello David,
Here is the coredump: https://herbizarre.swordarmor.fr/garbage/core-bird.29074.rr3.swordarmor.fr.1... And here is the config: https://herbizarre.swordarmor.fr/garbage/bird.conf.29074.rr3.swordarmor.fr.1...
I compiled bird3 with the two previous patches of this ML:
* Applying bird-3.0.0-nest-rt-table.c.patch ... * Applying bird-3.0.0-proto-lock.patch ...
And I enabled the debug:
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --docdir=/usr/share/doc/bird-3.0.0 --htmldir=/usr/share/doc/bird-3.0.0/html --libdir=/usr/lib64 --localstatedir=/var --enable-client --enable-debug --disable-libssh
So I have a different trace in dmesg (I supposed that it’s due to having debug symbols): [1305799.952226] bird[29075]: segfault at 8 ip 000055861e8cb989 sp 00007f13ab6e0b40 error 4 in bird[55861e7ed000+15c000] [1305799.952251] Code: 89 c7 e8 13 80 ff ff 89 85 6c ff ff ff 48 8b 45 c0 48 89 c7 e8 2d f3 ff ff 89 85 70 ff ff ff 48 83 7d b0 00 74 18 48 8b 45 b0 <0f> b6 00 0f b6 c0 83 e0 01 85 c0 74 07 b8 01 00 00 00 eb 05 b8 00
Regarding the setup, I don’t think that it’s that unconventional, but it’s a RR outside of the network (it’s on my own network which is pretty much experimental, so I experiment). I noticed that I never removed the OSPF from the kernel filters until now, but even with 'export none' I still get the crash.
So I then tried to enable all the debug options: rr3 ~ # grep debug /etc/bird.conf debug protocols all; debug channels all; debug tables all;
And here are the logs: https://paste.swordarmor.fr/raw/Owej
I hope it’s useful, don’t hesitate to ask me if you need anything else.
Happy new year to you too! Alarig
On Mon 30 Dec 2024 16:49:41 GMT, David Petera wrote:
Hello Alarig,
thanks for reporting!
The coredump and config would be much appreciated, so I can try to reproduce it. Also would you say there is something unconventional about the said setup that might be causing the problem?
Thanks in advance and happy new year,
David
On 12/19/24 19:16, Alarig Le Lay via Bird-users wrote:
Hello Maria,
I thought that I had the same bug, but it seems that it’s a different one as I applied the patch but I still got a segfault at startup.
[342115.227497] bird[19396]: segfault at 8 ip 00005642c37062db sp 00007fc784295ca0 error 4 in bird[5642c366e000+f0000] [342115.227524] Code: 39 10 0f 85 e3 8c f6 ff 48 8b 90 b0 01 00 00 8b 8a 10 01 00 00 89 4c 24 24 85 c9 0f 84 d6 00 00 00 4d 85 ff 0f 84 e0 00 00 00 <41> 0f b6 44 24 08 89 c2 83 e2 01 4d 85 ed 75 15 83 c8 01 41 88 44
I can generate a coredump and share my configuration if needed.
Alarig
[...]
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Never mind, it just crashed again :p Jan 8 15:45:32 rr3 bird: Assertion '!old' failed at nest/rt-table.c:1497 Do you want a new coredump? On Wed 08 Jan 2025 09:43:32 GMT, Alarig Le Lay via Bird-users wrote:
Hello Maria,
Thanks a lot, I confirm that it works well for me now :)
On Tue 07 Jan 2025 22:06:31 GMT, Maria Matejka via Bird-users wrote:
Hello Alarig,
this was indeed a problem (and there was some more to fix) and if I haven't missed anything (but it now looks OK for me), here is a fixed version: <https://gitlab.nic.cz/labs/bird/-/commit/2e14832d36c83b2ab5b7fb28b701de554fa5fdd9>
If you could check that it works for you as well, it would be very helpful.
Thank you for your report! Maria
On Tue, Dec 31, 2024 at 10:05:12AM +0100, Alarig Le Lay via Bird-users wrote:
I read the coredump (and the source code) with a fresh cup of coffee, and acted accordingly:
The error is there: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055861e8cb989 in bgp_rte_recalculate (table=0x55861fe0a250, net=0x55861fe1b320, new_stored=0x7f13aaedd070, old_stored=0x0, old_best_stored=0x0) at proto/bgp/attrs.c:2696
Which corresponds to the deterministic med: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... int old_suppressed = old ? !!(old->pflags & BGP_REF_SUPPRESSED) : 0;
To be used there: https://gitlab.nic.cz/labs/bird/-/blob/v3.0.0/proto/bgp/attrs.c?ref_type=tag... n = new->pflags & BGP_REF_SUPPRESSED; o = old->pflags & BGP_REF_SUPPRESSED;
So I commented it and now bird starts: rr3 ~ # grep med /etc/bird.conf med metric on; #deterministic med on; med metric on; #deterministic med on;
On Mon 30 Dec 2024 22:35:02 GMT, Alarig Le Lay wrote:
Hello David,
Here is the coredump: https://herbizarre.swordarmor.fr/garbage/core-bird.29074.rr3.swordarmor.fr.1... And here is the config: https://herbizarre.swordarmor.fr/garbage/bird.conf.29074.rr3.swordarmor.fr.1...
I compiled bird3 with the two previous patches of this ML:
* Applying bird-3.0.0-nest-rt-table.c.patch ... * Applying bird-3.0.0-proto-lock.patch ...
And I enabled the debug:
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --docdir=/usr/share/doc/bird-3.0.0 --htmldir=/usr/share/doc/bird-3.0.0/html --libdir=/usr/lib64 --localstatedir=/var --enable-client --enable-debug --disable-libssh
So I have a different trace in dmesg (I supposed that it’s due to having debug symbols): [1305799.952226] bird[29075]: segfault at 8 ip 000055861e8cb989 sp 00007f13ab6e0b40 error 4 in bird[55861e7ed000+15c000] [1305799.952251] Code: 89 c7 e8 13 80 ff ff 89 85 6c ff ff ff 48 8b 45 c0 48 89 c7 e8 2d f3 ff ff 89 85 70 ff ff ff 48 83 7d b0 00 74 18 48 8b 45 b0 <0f> b6 00 0f b6 c0 83 e0 01 85 c0 74 07 b8 01 00 00 00 eb 05 b8 00
Regarding the setup, I don’t think that it’s that unconventional, but it’s a RR outside of the network (it’s on my own network which is pretty much experimental, so I experiment). I noticed that I never removed the OSPF from the kernel filters until now, but even with 'export none' I still get the crash.
So I then tried to enable all the debug options: rr3 ~ # grep debug /etc/bird.conf debug protocols all; debug channels all; debug tables all;
And here are the logs: https://paste.swordarmor.fr/raw/Owej
I hope it’s useful, don’t hesitate to ask me if you need anything else.
Happy new year to you too! Alarig
On Mon 30 Dec 2024 16:49:41 GMT, David Petera wrote:
Hello Alarig,
thanks for reporting!
The coredump and config would be much appreciated, so I can try to reproduce it. Also would you say there is something unconventional about the said setup that might be causing the problem?
Thanks in advance and happy new year,
David
On 12/19/24 19:16, Alarig Le Lay via Bird-users wrote:
Hello Maria,
I thought that I had the same bug, but it seems that it’s a different one as I applied the patch but I still got a segfault at startup.
[342115.227497] bird[19396]: segfault at 8 ip 00005642c37062db sp 00007fc784295ca0 error 4 in bird[5642c366e000+f0000] [342115.227524] Code: 39 10 0f 85 e3 8c f6 ff 48 8b 90 b0 01 00 00 8b 8a 10 01 00 00 89 4c 24 24 85 c9 0f 84 d6 00 00 00 4d 85 ff 0f 84 e0 00 00 00 <41> 0f b6 44 24 08 89 c2 83 e2 01 4d 85 ed 75 15 83 c8 01 41 88 44
I can generate a coredump and share my configuration if needed.
Alarig
[...]
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
On Wed, Jan 08, 2025 at 03:51:37PM +0100, Alarig Le Lay via Bird-users wrote:
Never mind, it just crashed again :p Jan 8 15:45:32 rr3 bird: Assertion '!old' failed at nest/rt-table.c:1497
Oh my, that's tough. But I'm very happy that I wrote that consistency check, otherwise it may have done some very weird things. And indeed it found a consistency problem.
Do you want a new coredump?
I have left my test run overnight and it crashed as well at the same spot, so I'm now fixing and analyzing gigabytes of logs. Thanks for the heads-up. Maria -- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
participants (4)
-
Alarig Le Lay -
David Petera -
Maria Matejka -
Radu Anghel