bird 3.2.0 crashes: "Attempted to start channel's already started export"
Upgraded to 3.2.0 (from 1.x) on ancient i686 machine, running bird with -R and #0 0xb7ef9569 in __kernel_vsyscall () #1 0xb7cc7ab7 in ?? () from /lib/libc.so.6 #2 0xb7c71171 in raise () from /lib/libc.so.6 #3 0xb7c59216 in abort () from /lib/libc.so.6 #4 0x08148c71 in bug (msg=msg@entry=0x815e7b8 "%s.%s: Attempted to start channel's already started export") at sysdep/unix/log.c:412 #5 0x080af197 in channel_start_export (c=c@entry=0x9ab5e60) at nest/proto.c:780 #6 0x080b2d0b in graceful_recovery_done (_=0x81d30d0 <_graceful_recovery_context+16>) at nest/proto.c:2082 #7 0x08090182 in ev_run_list_limited (l=<optimized out>, limit=4294967293, limit@entry=4294967295) at lib/event.c:338 #8 0x0813eb77 in io_loop () at sysdep/unix/io.c:2646 #9 0x0804b32e in main (argc=2, argv=0xbfea0d24) at sysdep/unix/main.c:1111 Fed claude with the problem and its "findings" were like this below (bird doesn't crash with this "fix" but I can't tell if this is the right fix or papering of the real bug) Kernel: skip export start in krt_init_scan() when gr_wait is set When graceful restart recovery is active (-R), krt_start() sets gr_wait=1 on the kernel channel to defer export until recovery completes (via graceful_recovery_done()). However, krt_init_scan() unconditionally calls channel_start_export() when transitioning from KPS_INIT, without checking gr_wait. When graceful_recovery_done() later runs, it finds gr_wait=1 and tries to start the export again, triggering: bug("%s.%s: Attempted to start channel's already started export") Reported on bird-users by Christoph (Jan 2026) for BIRD 3.1.5. Still present in 3.2.0 despite bda2178e ("Kernel: pause exports also on restart until scan is done") which set rt_notify=NULL in krt_start() but didn't address the krt_init_scan() path. Skip channel_start_export() in krt_init_scan() when gr_wait is set, letting graceful_recovery_done() handle it instead. --- a/sysdep/unix/krt.c +++ b/sysdep/unix/krt.c @@ -449,7 +449,9 @@ krt_init_scan(struct krt_proto *p) case KPS_INIT: /* Allow exports now */ p->p.rt_notify = krt_rt_notify; - channel_start_export(p->p.main_channel); + /* When gr_wait is set, graceful_recovery_done() will start the export */ + if (!p->p.main_channel->gr_wait) + channel_start_export(p->p.main_channel); rt_refresh_begin(&p->p.main_channel->in_req); p->sync_state = KPS_FIRST_SCAN; return 1; - The upstream fix bda2178e / 47ed4aa0 is already in 3.2.0 — it sets rt_notify = NULL in krt_start() to defer exports - But it's incomplete — krt_init_scan() still unconditionally calls channel_start_export() on first scan timer, without checking gr_wait - This is a new bug in the 3.2.0 release, not present in older code (since 91c98efe introduced the KPS_INIT state machine) - The fix on the 372-release-v3.2.1 branch has the same code — the bug will still be in 3.2.1 unless they pick up a further fix - Our patch is the correct minimal fix for this -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
On 18/03/2026 19:39, Arkadiusz Miśkiewicz wrote:
- The fix on the 372-release-v3.2.1 branch has the same code — the bug will still be in 3.2.1 unless they pick up a further fix
Uh, missed that. Sorry for the noise. -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
participants (1)
-
Arkadiusz Miśkiewicz