Hello to the community! I have 2 things to clarify here. On Fri, Oct 17, 2025 at 9:43 PM Ze Xia <billxia135@gmail.com> wrote:
+ bgp_start_timer(p, conn->connect_timer, p->cf->connect_retry_time);
First, the above line is copied from 3.1.2 code and is inconsistent with master branch, the p parameter should be removed in master branch, I'm sorry for this mistake. The second thing is about how to reproduce the bug. It happens whenever TCP connect() succeeds immediately, however it's pretty hard to happen in real life if it is ever possible. I tried starting two bird daemons that binds to 127.0.0.1 with different ports, and strace shows that connect() returns -1 with EINPROGRESS like this: connect(12, {sa_family=AF_INET, sin_port=htons(179), sin_addr=inet_addr("127.0.0.1")}, 32) = -1 EINPROGRESS (Operation now in progress) which means that the bug won't be reproduced. I discovered this bug in a very special use case. I'm running a lot of bird daemons on a single physical machine to acquire routing tables for a network, and I'm trying to replace TCP sockets with UDS(Unix Domain Socket), by LD_PRELOAD-ing a customized .so, as in this proxychains project: https://github.com/haad/proxychains/tree/master Basically I'm replacing the implementation of connect() function (along with others) in libc.so with my own one, so all connect() calls made by bird jumps into my code. Then, UDS connect()-s can success immediately, thus I returned 0 to connect() call, representing immediate success, causing the bug to happen. The phenomenon is that bird actively tears down BGP connections after connect retry timeout (around 90s, default value:120s). I can stably reproduce this. In my use case, I should fix this by emulating real-world behaviours of connect(), i.e. returning EINPROGRESS and inject POLLOUT in later poll(). However, I don't think that bird should rely on the fact that connect() never immediately success, thus this patch should be applied.