bird 3.1.5 crashes after adding "-R" to BIRD_ARGS
Hello, last night we made two changes to our BIRD config: * enabled ASPA validation (logging only) * added "-R" to BIRD_ARGS (/etc/bird/envvars) after that we run "configure" on birdcl. After that bird started to crash in a loop. We decided to rollback the ASPA configuration change to restore a working setup. This did not help. Then we decided to remove "-R" from BIRD_ARGS and that helped. For this reason we did not include the ASPA config changes in this email since they appear to be not relevant for the crashes. We assume "-R" causes the crashes. Is this a known bug that is solved in 3.2.0? We would like to have graceful restart support but we are a bit hesitant with "-R" now ;-) Logs: 2026-01-05 01:15:31.820 [0001] <INFO> Reconfiguring 2026-01-05 01:15:31.822 [0001] <INFO> Cannot add channel r3k.aspa 2026-01-05 01:15:31.822 [0001] <INFO> Restarting protocol r3k 2026-01-05 01:15:38.339 [0001] <INFO> Reloading channel anexia_v4.ipv4 2026-01-05 01:15:38.339 [0001] <INFO> Reloading channel anexia_v6.ipv6 2026-01-05 01:15:38.366 [0001] <INFO> Reloading channel nextlayer_v4.ipv4 2026-01-05 01:15:38.367 [0001] <INFO> Reloading channel nextlayer_v6.ipv6 2026-01-05 01:15:38.367 [0001] <INFO> Reconfigured 2026-01-05 01:15:42.397 [0002] <BUG> Assertion 'r->cur' failed at lib/lockfree.c:229 Jan 05 01:15:42 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:15:42 bird.service: Failed with result 'signal'. Jan 05 01:15:42 bird.service: Consumed 1d 39min 31.457s CPU time, 1.6G memory peak. Jan 05 01:15:42 bird.service: Scheduled restart job, restart counter is at 2. lots of crashes: journalctl ..|grep 'bird.service: Main process exited,' Jan 05 01:15:42 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:17:16 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:18:45 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:20:13 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:21:28 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:21:41 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:22:42 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:22:56 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:24:22 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:24:36 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:25:30 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:27:02 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:27:16 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:28:10 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:29:37 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:29:51 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:30:52 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:30:59 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:32:10 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:32:20 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:36:04 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:36:18 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:36:33 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:36:47 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:37:02 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:37:16 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:38:36 bird.service: Main process exited, code=killed, status=6/ABRT Jan 05 01:38:51 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:39:05 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:39:20 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:39:34 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:39:49 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:41:26 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:41:40 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:41:55 bird.service: Main process exited, code=killed, status=11/SEGV Jan 05 01:42:10 bird.service: Main process exited, code=killed, status=11/SEGV Bird version (dpkg -l bird): ii bird3 3.1.5-cznic.1~trixie amd64 Internet Routing Daemon best regards, Christoph
Some more lines from the logfile suggests an issue with netlink performance? 2026-01-05 01:16:00.367 [0001] <WARN> Netlink: File exists << many times 2026-01-05 01:16:00.367 [0001] <WARN> ... 2026-01-05 01:16:00.636 [0002] <RMT> anexia_v4: Received: Connection collision resolution 2026-01-05 01:16:00.687 [0001] <WARN> I/O loop cycle took 16627.101 ms for 6 events 2026-01-05 01:16:06.127 [0001] <WARN> Netlink: File exists 2026-01-05 01:16:06.128 [0001] <WARN> .. 2026-01-05 01:16:44.093 [0001] <WARN> kernel1: Can't scan, still pruning 2026-01-05 01:16:45.310 [0001] <WARN> Kernel dropped some netlink messages, will resync on next scan. 2026-01-05 01:16:45.841 [0001] <WARN> Kernel dropped some netlink messages, will resync on next scan. 2026-01-05 01:22:42.408 [0001] <BUG> kernel1.ipv4: Attempted to start channel's already started export 2026-01-05 01:24:21.910 [0001] <BUG> kernel1.ipv4: Attempted to start channel's already started export 2026-01-05 01:24:50.326 [0001] <WARN> I/O loop cycle took 12889.620 ms for 5 events We had that "Kernel dropped some netlink..." a while ago (>5years) and added this to our sysctl.conf back then: # solves the "Kernel dropped some netlink messages, will resync on next scan." BIRD Warnings # https://bird.network.cz/pipermail/bird-users/2017-September/011542.html # (we did not change the wmem_default since this is apparently not needed) net.core.rmem_max=16777216 net.core.rmem_default=4194304 net.core.wmem_max=4194304 Are these still sensible values or should they be increased further? This is a BGP router getting 2 full tables. best regards, Christoph
On Mon, Jan 05, 2026 at 03:35:43PM +0100, Christoph via Bird-users wrote:
Some more lines from the logfile suggests an issue with netlink performance?
2026-01-05 01:16:00.367 [0001] <WARN> Netlink: File exists << many times [...] 2026-01-05 01:16:44.093 [0001] <WARN> kernel1: Can't scan, still pruning
This is more likely a synchronization bug. Will check though.
2026-01-05 01:22:42.408 [0001] <BUG> kernel1.ipv4: Attempted to start channel's already started export 2026-01-05 01:24:21.910 [0001] <BUG> kernel1.ipv4: Attempted to start channel's already started export
And this is even more suspicious. Will check whether it's possible to reproduce that from this description and get back to you if i need more information from you.
net.core.rmem_max=16777216 net.core.rmem_default=4194304 net.core.wmem_max=4194304
Are these still sensible values or should they be increased further? This is a BGP router getting 2 full tables.
BIRD 3 should be _faster_ doing this, not slower. If you need to update these values, it's definitely _our_ bug. Maria -- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hello again, I managed to get it to crash also on 3.2.0 without the "-R" so, this might be two instead of just one issue. Without "-R" at least it crashes only once and not in a loop. The last log line in the trace level log before the crash: 2026-01-05 15:56:04.980 [0002] <BUG> Assertion 'r->cur' failed at lib/lockfree.c:229 coredumpctl output: PID: 400061 (bird) UID: 103 (bird) GID: 105 (bird) Signal: 6 (ABRT) Timestamp: Mon 2026-01-05 15:56:05 CET (8min ago) Command Line: /usr/sbin/bird -f -u bird -g bird Executable: /usr/sbin/bird Control Group: /system.slice/bird.service Unit: bird.service Slice: system.slice Storage: /var/lib/systemd/coredump/core.bird.103.bca60f28d7ae4991a019cc2291fdaeee.400061.1767624965000000.zst (present) Size on Disk: 153.1M Message: Process 400061 (bird) of user 103 dumped core. Module libzstd.so.1 from deb libzstd-1.5.7+dfsg-1.amd64 Stack trace of thread 400070: #0 0x00007fcc476e295c n/a (libc.so.6 + 0x9495c) #1 0x00007fcc4768dcc2 raise (libc.so.6 + 0x3fcc2) #2 0x00007fcc476764ac abort (libc.so.6 + 0x284ac) #3 0x0000557e4b9e84dc n/a (/usr/sbin/bird + 0x10f4dc) #4 0x0000557e4b9369ba n/a (/usr/sbin/bird + 0x5d9ba) #5 0x0000557e4b94f52b n/a (/usr/sbin/bird + 0x7652b) #6 0x0000557e4b95f911 n/a (/usr/sbin/bird + 0x86911) #7 0x0000557e4b94f95a n/a (/usr/sbin/bird + 0x7695a) #8 0x0000557e4b932956 n/a (/usr/sbin/bird + 0x59956) #9 0x0000557e4b9e581f n/a (/usr/sbin/bird + 0x10c81f) #10 0x0000557e4b932956 n/a (/usr/sbin/bird + 0x59956) #11 0x0000557e4b9e3692 n/a (/usr/sbin/bird + 0x10a692) #12 0x00007fcc476e0b7b n/a (libc.so.6 + 0x92b7b) #13 0x00007fcc4775e7b8 n/a (libc.so.6 + 0x1107b8) Stack trace of thread 400061: #0 0x00007fcc476e89ee n/a (libc.so.6 + 0x9a9ee) #1 0x00007fcc476dd668 n/a (libc.so.6 + 0x8f668) #2 0x00007fcc476dd6ad n/a (libc.so.6 + 0x8f6ad) #3 0x00007fcc477519c6 __poll (libc.so.6 + 0x1039c6) #4 0x0000557e4b9de02f n/a (/usr/sbin/bird + 0x10502f) #5 0x0000557e4b8ee3de n/a (/usr/sbin/bird + 0x153de) #6 0x00007fcc47677ca8 n/a (libc.so.6 + 0x29ca8) #7 0x00007fcc47677d65 __libc_start_main (libc.so.6 + 0x29d65) #8 0x0000557e4b8ee671 n/a (/usr/sbin/bird + 0x15671) ELF object binary architecture: AMD x86-64 version: bird3 3.2.0-cznic.1~trixie amd64 best regards, Christoph config diff for enabing ASPA: ------------- diff --git a/filter.conf b/filter.conf index ecd825d..c349eaa 100644 --- a/filter.conf +++ b/filter.conf @@ -23,6 +23,12 @@ filter transit_in { } } + # Log ASPA_INVALIDs + if (aspa_check_downstream(at) = ASPA_INVALID) then + { + print "ASPA INVALID announcement (monitoring only): ", net, " AS path: ", bgp_path; + } + accept; } diff --git a/rpki.conf b/rpki.conf index 26335ba..8414266 100644 --- a/rpki.conf +++ b/rpki.conf @@ -2,6 +2,7 @@ roa4 table r4; roa6 table r6; +aspa table at; protocol rpki r3k { @@ -9,6 +10,7 @@ protocol rpki r3k { roa4 { table r4; }; roa6 { table r6; }; + aspa { table at; }; ---------------
Hello Christoph, I finally managed to reproduce this, and it actually doesn't have anything with -R, nor with ASPA directly, but it looks like a bug in both 3.1.5 and 3.2.0, and the fix will be in upcoming 3.2.1 and 3.1.6. You may see the WIP branch with all the boring technical details in our gitlab: https://gitlab.nic.cz/labs/bird/-/commits/343-bad-journal-release This link will stop working later but you'll find the commits by searching for `#343` in the commit messages. I have still several points on my checklist arising from your reports, thank you so much for them. I'll get back to you as soon as I have more progress. Thank you and have a nice day! Maria On Mon, Jan 05, 2026 at 04:12:10PM +0100, Christoph via Bird-users wrote:
Hello again,
I managed to get it to crash also on 3.2.0 without the "-R" so, this might be two instead of just one issue. Without "-R" at least it crashes only once and not in a loop.
The last log line in the trace level log before the crash: 2026-01-05 15:56:04.980 [0002] <BUG> Assertion 'r->cur' failed at lib/lockfree.c:229
coredumpctl output: PID: 400061 (bird) UID: 103 (bird) GID: 105 (bird) Signal: 6 (ABRT) Timestamp: Mon 2026-01-05 15:56:05 CET (8min ago) Command Line: /usr/sbin/bird -f -u bird -g bird Executable: /usr/sbin/bird Control Group: /system.slice/bird.service Unit: bird.service Slice: system.slice Storage: /var/lib/systemd/coredump/core.bird.103.bca60f28d7ae4991a019cc2291fdaeee.400061.1767624965000000.zst (present) Size on Disk: 153.1M Message: Process 400061 (bird) of user 103 dumped core.
Module libzstd.so.1 from deb libzstd-1.5.7+dfsg-1.amd64 Stack trace of thread 400070: #0 0x00007fcc476e295c n/a (libc.so.6 + 0x9495c) #1 0x00007fcc4768dcc2 raise (libc.so.6 + 0x3fcc2) #2 0x00007fcc476764ac abort (libc.so.6 + 0x284ac) #3 0x0000557e4b9e84dc n/a (/usr/sbin/bird + 0x10f4dc) #4 0x0000557e4b9369ba n/a (/usr/sbin/bird + 0x5d9ba) #5 0x0000557e4b94f52b n/a (/usr/sbin/bird + 0x7652b) #6 0x0000557e4b95f911 n/a (/usr/sbin/bird + 0x86911) #7 0x0000557e4b94f95a n/a (/usr/sbin/bird + 0x7695a) #8 0x0000557e4b932956 n/a (/usr/sbin/bird + 0x59956) #9 0x0000557e4b9e581f n/a (/usr/sbin/bird + 0x10c81f) #10 0x0000557e4b932956 n/a (/usr/sbin/bird + 0x59956) #11 0x0000557e4b9e3692 n/a (/usr/sbin/bird + 0x10a692) #12 0x00007fcc476e0b7b n/a (libc.so.6 + 0x92b7b) #13 0x00007fcc4775e7b8 n/a (libc.so.6 + 0x1107b8)
Stack trace of thread 400061: #0 0x00007fcc476e89ee n/a (libc.so.6 + 0x9a9ee) #1 0x00007fcc476dd668 n/a (libc.so.6 + 0x8f668) #2 0x00007fcc476dd6ad n/a (libc.so.6 + 0x8f6ad) #3 0x00007fcc477519c6 __poll (libc.so.6 + 0x1039c6) #4 0x0000557e4b9de02f n/a (/usr/sbin/bird + 0x10502f) #5 0x0000557e4b8ee3de n/a (/usr/sbin/bird + 0x153de) #6 0x00007fcc47677ca8 n/a (libc.so.6 + 0x29ca8) #7 0x00007fcc47677d65 __libc_start_main (libc.so.6 + 0x29d65) #8 0x0000557e4b8ee671 n/a (/usr/sbin/bird + 0x15671) ELF object binary architecture: AMD x86-64
version: bird3 3.2.0-cznic.1~trixie amd64
best regards, Christoph
config diff for enabing ASPA:
------------- diff --git a/filter.conf b/filter.conf index ecd825d..c349eaa 100644 --- a/filter.conf +++ b/filter.conf @@ -23,6 +23,12 @@ filter transit_in { } }
+ # Log ASPA_INVALIDs + if (aspa_check_downstream(at) = ASPA_INVALID) then + { + print "ASPA INVALID announcement (monitoring only): ", net, " AS path: ", bgp_path; + } + accept;
} diff --git a/rpki.conf b/rpki.conf index 26335ba..8414266 100644 --- a/rpki.conf +++ b/rpki.conf @@ -2,6 +2,7 @@
roa4 table r4; roa6 table r6; +aspa table at;
protocol rpki r3k { @@ -9,6 +10,7 @@ protocol rpki r3k {
roa4 { table r4; }; roa6 { table r6; }; + aspa { table at; }; ---------------
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
participants (2)
-
Christoph -
Maria Matejka