Hi all, My guess is that it could depend on socket type. I.e. nonblocking tcp socket in Linux might always return EINPROGRESS, but for unix socket other situations might be possible. Regards, Alexander On Fri, Oct 17, 2025, 22:02 Ze Xia <billxia135@gmail.com> wrote:
On Fri, Oct 17, 2025 at 10:57 PM Maria Matejka <maria.matejka@nic.cz> wrote:
Hello Ze Xia,
this looks like a real bug, yet I'm not sure whether we happen to
observe it in real world often. Please, do you have any instructions how to trigger it reliably so that we can add it to our CI?
Thanks, Maria
I tried to trigger it "naturally" by creating 2 bird daemons connected through veth-pair, this fails to reproduce the bug. According to strace, connect() always returns -1 with errno = EINPROGRESS.
However, I figured out that I can wait a little while for connect() to success by preloading a custom dynamically-linked library. My current implementation:
#include <dlfcn.h> #include <errno.h> #include <poll.h> #include <sys/socket.h>
// milliseconds #define MAX_CONNECT_BLOCKTIME 10
typedef int (*connect_t)(int, const struct sockaddr *, socklen_t);
__attribute__((visibility("default"))) int connect(int sock, const struct sockaddr *addr, socklen_t len) { int orig_errno = errno;
connect_t true_connect = dlsym(RTLD_NEXT, "connect"); int r = true_connect(sock, addr, len); if (!(addr->sa_family == AF_INET && r == -1 && errno == EINPROGRESS)) return r;
struct pollfd fds[1] = {{.fd = sock, .events = POLLOUT | POLLERR | POLLHUP}}; int poll_res = poll(fds, 1, MAX_CONNECT_BLOCKTIME); if (poll_res == 0) { errno = EINPROGRESS; return -1; } int err; socklen_t errlen = sizeof(err); getsockopt(sock, SOL_SOCKET, SO_ERROR, &err, &errlen); if (err == 0) { errno = orig_errno; return 0; } else { errno = err; return -1; } }
Compile it with:
gcc bird-preload.c -fPIC -fvisibility=hidden -shared -o libpreload.so
Then write the absolute path of libpreload.so to /etc/ld.so.preload (man ld.so for more information about LD_PRELOAD). I started 2 bird daemons inside a docker container with config file as in attachment of this mail, and connected them with veth-pair. When this libpreload.so is preloaded, the connect retry timer (2s) should fire every time and tears down the connection, causing a reconnection, which can be checked in the debug log.
With the libpreload.so, bird should behave just like the thread does not get scheduled for a while (<10ms) when calling connect(), it seems to have no other side-effect to me. I'm not sure does this fits in your CI workflow though. Hope this helps!
Regards, Ze Xia