Hi, I was trying to upgrade my bird instance from v1.6.0 to v1.6.3 today. Unfortunately immediately after start the bird started generating these messages 2016-12-30 12:02:35 <INFO> Started 2016-12-30 12:02:43 <WARN> I/O loop cycle took 8679 ms for 8 events 2016-12-30 12:02:43 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:02:43 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:02:52 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:03 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:10 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:16 <WARN> Kernel dropped some netlink messages, will resync on next scan It's a full BGP IPv4 instance with just three interfaces. Tested on linux kernels 4.4 or 4.8. Package is from cznic deb repository. I can see the warning log line was added at Dec 20 2016, which ended up in 1.6.3 release. Hence this was happening probably even before the upgrade, but it was not visible. So the question is simple. Does it means there is something terribly wrong with my machine, or the high rate of this warning is expected? Cheers Michal
Stupid... increasing read buffer helped of course sysctl -w net.core.rmem_default=1064960 I see, the bird is not increasing the read buffer on it's netlink socket. Would it be bad idea to create an configuration option for that? On 2016-12-30 12:27, Michal wrote:
Hi,
I was trying to upgrade my bird instance from v1.6.0 to v1.6.3 today. Unfortunately immediately after start the bird started generating these messages
2016-12-30 12:02:35 <INFO> Started 2016-12-30 12:02:43 <WARN> I/O loop cycle took 8679 ms for 8 events 2016-12-30 12:02:43 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:02:43 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:02:52 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:03 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:10 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:16 <WARN> Kernel dropped some netlink messages, will resync on next scan
It's a full BGP IPv4 instance with just three interfaces. Tested on linux kernels 4.4 or 4.8. Package is from cznic deb repository.
I can see the warning log line was added at Dec 20 2016, which ended up in 1.6.3 release. Hence this was happening probably even before the upgrade, but it was not visible.
So the question is simple. Does it means there is something terribly wrong with my machine, or the high rate of this warning is expected?
Cheers Michal
On Fri, Dec 30, 2016 at 01:13:59PM +0100, Michal wrote:
Stupid... increasing read buffer helped of course
sysctl -w net.core.rmem_default=1064960
I see, the bird is not increasing the read buffer on it's netlink socket. Would it be bad idea to create an configuration option for that?
Yes, that is probably a good idea. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 2016-12-30 13:26, Ondrej Zajicek wrote:
On Fri, Dec 30, 2016 at 01:13:59PM +0100, Michal wrote:
Stupid... increasing read buffer helped of course
sysctl -w net.core.rmem_default=1064960
I see, the bird is not increasing the read buffer on it's netlink socket. Would it be bad idea to create an configuration option for that?
Yes, that is probably a good idea.
Ok, let's see if can pull up some C skills.
On 2016-12-30 13:30, Michal wrote:
On 2016-12-30 13:26, Ondrej Zajicek wrote:
On Fri, Dec 30, 2016 at 01:13:59PM +0100, Michal wrote:
Stupid... increasing read buffer helped of course
sysctl -w net.core.rmem_default=1064960
I see, the bird is not increasing the read buffer on it's netlink socket. Would it be bad idea to create an configuration option for that?
Yes, that is probably a good idea.
Ok, let's see if can pull up some C skills.
So here comes the ugly patch. I can confirm that rising rcvbuf size to something bigger fixes my problem. conf/conf.c | 1 + conf/conf.h | 1 + sysdep/linux/netlink.Y | 7 ++++++- sysdep/linux/netlink.c | 18 ++++++++++++++---- 4 files changed, 22 insertions(+), 5 deletions(-) diff --git a/conf/conf.c b/conf/conf.c index 0a4e3f8c..b8d77bdd 100644 --- a/conf/conf.c +++ b/conf/conf.c @@ -104,6 +104,7 @@ config_alloc(const byte *name) c->tf_route = c->tf_proto = (struct timeformat){"%T", "%F", 20*3600}; c->tf_base = c->tf_log = (struct timeformat){"%F %T", NULL, 0}; c->gr_wait = DEFAULT_GR_WAIT; + c->netlink_rcvbuf = 0; return c; } diff --git a/conf/conf.h b/conf/conf.h index 41cb434f..cf0bfc67 100644 --- a/conf/conf.h +++ b/conf/conf.h @@ -55,6 +55,7 @@ struct config { int obstacle_count; /* Number of items blocking freeing of this config */ int shutdown; /* This is a pseudo-config for daemon shutdown */ bird_clock_t load_time; /* When we've got this configuration */ + size_t netlink_rcvbuf; /* netlink receive buffer size */ }; /* Please don't use these variables in protocols. Use proto_config->global instead. */ diff --git a/sysdep/linux/netlink.Y b/sysdep/linux/netlink.Y index f577244d..d7dc3347 100644 --- a/sysdep/linux/netlink.Y +++ b/sysdep/linux/netlink.Y @@ -10,7 +10,7 @@ CF_HDR CF_DECLS -CF_KEYWORDS(KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_SCOPE, KRT_MTU, KRT_WINDOW, +CF_KEYWORDS(NETLINK, RCVBUF, KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_SCOPE, KRT_MTU, KRT_WINDOW, KRT_RTT, KRT_RTTVAR, KRT_SSTRESH, KRT_CWND, KRT_ADVMSS, KRT_REORDERING, KRT_HOPLIMIT, KRT_INITCWND, KRT_RTO_MIN, KRT_INITRWND, KRT_QUICKACK, KRT_LOCK_MTU, KRT_LOCK_WINDOW, KRT_LOCK_RTT, KRT_LOCK_RTTVAR, @@ -19,6 +19,11 @@ CF_KEYWORDS(KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_SCOPE, KRT_MTU, K CF_GRAMMAR +CF_ADDTO(conf, netlink) + +netlink: + NETLINK RCVBUF expr ';' { new_config->netlink_rcvbuf = $3; } + CF_ADDTO(kern_proto, kern_proto kern_sys_item ';') kern_sys_item: diff --git a/sysdep/linux/netlink.c b/sysdep/linux/netlink.c index 22313f43..28fd26f3 100644 --- a/sysdep/linux/netlink.c +++ b/sysdep/linux/netlink.c @@ -1533,11 +1533,12 @@ nl_async_err_hook(sock *sk, int e UNUSED) } static void -nl_open_async(void) +nl_open_async(struct proto *p) { sock *sk; struct sockaddr_nl sa; int fd; + size_t rcvbuf; if (nl_async_sk) return; @@ -1551,6 +1552,15 @@ nl_open_async(void) return; } + rcvbuf = p->cf->global->netlink_rcvbuf; + if (rcvbuf > 0) + { + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof(rcvbuf)) < 0) + { + log(L_ERR "Unable to set netlink socket receive buffer size: %m"); + } + } + bzero(&sa, sizeof(sa)); sa.nl_family = AF_NETLINK; #ifdef IPV6 @@ -1603,7 +1613,7 @@ krt_sys_start(struct krt_proto *p) HASH_INSERT2(nl_table_map, RTH, krt_pool, p); nl_open(); - nl_open_async(); + nl_open_async(&(p->p)); return 1; } @@ -1685,10 +1695,10 @@ krt_sys_get_attr(eattr *a, byte *buf, int buflen UNUSED) void -kif_sys_start(struct kif_proto *p UNUSED) +kif_sys_start(struct kif_proto *p) { nl_open(); - nl_open_async(); + nl_open_async(&(p->p)); } void
I think this would be a very good addition to the program. I'm the one who originally had infinite loops in Bird due to heavy netlink traffic; in my case it was probably conntrackd, keeping track of ~1M flows. The fix by Ondrej and Jan solved the immediate problem, of infinite loops and hangs, but I still worry if the message drops might cause undefined behavior. Increasing the buffer seems logical. On 12/30/2016 03:53 PM, Michal wrote:
On 2016-12-30 13:30, Michal wrote:
On 2016-12-30 13:26, Ondrej Zajicek wrote:
On Fri, Dec 30, 2016 at 01:13:59PM +0100, Michal wrote:
Stupid... increasing read buffer helped of course
sysctl -w net.core.rmem_default=1064960
I see, the bird is not increasing the read buffer on it's netlink socket. Would it be bad idea to create an configuration option for that?
Yes, that is probably a good idea.
Ok, let's see if can pull up some C skills.
So here comes the ugly patch. I can confirm that rising rcvbuf size to something bigger fixes my problem.
conf/conf.c | 1 + conf/conf.h | 1 + sysdep/linux/netlink.Y | 7 ++++++- sysdep/linux/netlink.c | 18 ++++++++++++++---- 4 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/conf/conf.c b/conf/conf.c index 0a4e3f8c..b8d77bdd 100644 --- a/conf/conf.c +++ b/conf/conf.c @@ -104,6 +104,7 @@ config_alloc(const byte *name) c->tf_route = c->tf_proto = (struct timeformat){"%T", "%F", 20*3600}; c->tf_base = c->tf_log = (struct timeformat){"%F %T", NULL, 0}; c->gr_wait = DEFAULT_GR_WAIT; + c->netlink_rcvbuf = 0;
return c; } diff --git a/conf/conf.h b/conf/conf.h index 41cb434f..cf0bfc67 100644 --- a/conf/conf.h +++ b/conf/conf.h @@ -55,6 +55,7 @@ struct config { int obstacle_count; /* Number of items blocking freeing of this config */ int shutdown; /* This is a pseudo-config for daemon shutdown */ bird_clock_t load_time; /* When we've got this configuration */ + size_t netlink_rcvbuf; /* netlink receive buffer size */ };
/* Please don't use these variables in protocols. Use proto_config->global instead. */ diff --git a/sysdep/linux/netlink.Y b/sysdep/linux/netlink.Y index f577244d..d7dc3347 100644 --- a/sysdep/linux/netlink.Y +++ b/sysdep/linux/netlink.Y @@ -10,7 +10,7 @@ CF_HDR
CF_DECLS
-CF_KEYWORDS(KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_SCOPE, KRT_MTU, KRT_WINDOW, +CF_KEYWORDS(NETLINK, RCVBUF, KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_SCOPE, KRT_MTU, KRT_WINDOW, KRT_RTT, KRT_RTTVAR, KRT_SSTRESH, KRT_CWND, KRT_ADVMSS, KRT_REORDERING, KRT_HOPLIMIT, KRT_INITCWND, KRT_RTO_MIN, KRT_INITRWND, KRT_QUICKACK, KRT_LOCK_MTU, KRT_LOCK_WINDOW, KRT_LOCK_RTT, KRT_LOCK_RTTVAR, @@ -19,6 +19,11 @@ CF_KEYWORDS(KERNEL, TABLE, METRIC, KRT_PREFSRC, KRT_REALM, KRT_SCOPE, KRT_MTU, K
CF_GRAMMAR
+CF_ADDTO(conf, netlink) + +netlink: + NETLINK RCVBUF expr ';' { new_config->netlink_rcvbuf = $3; } + CF_ADDTO(kern_proto, kern_proto kern_sys_item ';')
kern_sys_item: diff --git a/sysdep/linux/netlink.c b/sysdep/linux/netlink.c index 22313f43..28fd26f3 100644 --- a/sysdep/linux/netlink.c +++ b/sysdep/linux/netlink.c @@ -1533,11 +1533,12 @@ nl_async_err_hook(sock *sk, int e UNUSED) }
static void -nl_open_async(void) +nl_open_async(struct proto *p) { sock *sk; struct sockaddr_nl sa; int fd; + size_t rcvbuf;
if (nl_async_sk) return; @@ -1551,6 +1552,15 @@ nl_open_async(void) return; }
+ rcvbuf = p->cf->global->netlink_rcvbuf; + if (rcvbuf > 0) + { + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof(rcvbuf)) < 0) + { + log(L_ERR "Unable to set netlink socket receive buffer size: %m"); + } + } + bzero(&sa, sizeof(sa)); sa.nl_family = AF_NETLINK; #ifdef IPV6 @@ -1603,7 +1613,7 @@ krt_sys_start(struct krt_proto *p) HASH_INSERT2(nl_table_map, RTH, krt_pool, p);
nl_open(); - nl_open_async(); + nl_open_async(&(p->p));
return 1; } @@ -1685,10 +1695,10 @@ krt_sys_get_attr(eattr *a, byte *buf, int buflen UNUSED)
void -kif_sys_start(struct kif_proto *p UNUSED) +kif_sys_start(struct kif_proto *p) { nl_open(); - nl_open_async(); + nl_open_async(&(p->p)); }
void
On Fri, Dec 30, 2016 at 04:53:52PM +0100, Michal wrote:
On 2016-12-30 13:30, Michal wrote:
On 2016-12-30 13:26, Ondrej Zajicek wrote:
On Fri, Dec 30, 2016 at 01:13:59PM +0100, Michal wrote:
Stupid... increasing read buffer helped of course
sysctl -w net.core.rmem_default=1064960
I see, the bird is not increasing the read buffer on it's netlink socket. Would it be bad idea to create an configuration option for that?
Yes, that is probably a good idea.
Ok, let's see if can pull up some C skills.
So here comes the ugly patch. I can confirm that rising rcvbuf size to something bigger fixes my problem.
Hi Thanks. Am i understand it correctly that rising rcvbuf size by this patch (without the sysctl change) fixed the problem? We will probably merge the patch after some cleanups, Jan Matejka is doing some netlink experiments to see if there are some other ways to fix the netlink socket to misbehave. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 2017-01-02 15:02, Ondrej Zajicek wrote:
On Fri, Dec 30, 2016 at 04:53:52PM +0100, Michal wrote:
On 2016-12-30 13:30, Michal wrote:
On 2016-12-30 13:26, Ondrej Zajicek wrote:
On Fri, Dec 30, 2016 at 01:13:59PM +0100, Michal wrote:
Stupid... increasing read buffer helped of course
sysctl -w net.core.rmem_default=1064960
I see, the bird is not increasing the read buffer on it's netlink socket. Would it be bad idea to create an configuration option for that?
Yes, that is probably a good idea.
Ok, let's see if can pull up some C skills.
So here comes the ugly patch. I can confirm that rising rcvbuf size to something bigger fixes my problem.
Hi
Thanks. Am i understand it correctly that rising rcvbuf size by this patch (without the sysctl change) fixed the problem? We will probably merge the patch after some cleanups, Jan Matejka is doing some netlink experiments to see if there are some other ways to fix the netlink socket to misbehave.
Yop, tested in production.
On Fri, Dec 30, 2016 at 12:27:13PM +0100, Michal wrote:
Hi,
I was trying to upgrade my bird instance from v1.6.0 to v1.6.3 today. Unfortunately immediately after start the bird started generating these messages .. I can see the warning log line was added at Dec 20 2016, which ended up in 1.6.3 release. Hence this was happening probably even before the upgrade, but it was not visible.
Hi That is true. We fixed a race condition leading to infinite loops and added this warning. But the issue triggering the warning was here even in older versions, just silent.
So the question is simple. Does it means there is something terribly wrong with my machine, or the high rate of this warning is expected?
We are not sure what causes such high rate of warnings on some machines. Do you use something that could cause significant netlink traffic, e.g. conntrackd or ulogd? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
I'm using collectd netlink plugin to monitor traffic on those interfaces. I would expect it generates some netlink traffic, but not that big. On 2016-12-30 13:15, Ondrej Zajicek wrote:
On Fri, Dec 30, 2016 at 12:27:13PM +0100, Michal wrote:
Hi,
I was trying to upgrade my bird instance from v1.6.0 to v1.6.3 today. Unfortunately immediately after start the bird started generating these messages .. I can see the warning log line was added at Dec 20 2016, which ended up in 1.6.3 release. Hence this was happening probably even before the upgrade, but it was not visible.
Hi
That is true. We fixed a race condition leading to infinite loops and added this warning. But the issue triggering the warning was here even in older versions, just silent.
So the question is simple. Does it means there is something terribly wrong with my machine, or the high rate of this warning is expected?
We are not sure what causes such high rate of warnings on some machines. Do you use something that could cause significant netlink traffic, e.g. conntrackd or ulogd?
Hi!
2016-12-30 12:02:43 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:02:43 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:02:52 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:03 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:10 <WARN> Kernel dropped some netlink messages, will resync on next scan. 2016-12-30 12:03:16 <WARN> Kernel dropped some netlink messages, will resync on next scan
I can see the warning log line was added at Dec 20 2016, which ended up in 1.6.3 release. Hence this was happening probably even before the upgrade, but it was not visible.
Yes, it was happening and it sometimes caused strange behavior. It should be logged as it is not detectable in any other reasonable way. However, you may decide that it is OK for your machine and ignore (filter out) this log message if you want.
So the question is simple. Does it means there is something terribly wrong with my machine, or the high rate of this warning is expected?
There may be several reasons of this: 1) Something changes in kernel really fast. Routes, interfaces or something like that. Therefore some routes (the dropped ones) appear in the kernel protocol not immediately, but during the next regular scan. 2) Something changes in kernel really fast, it fills the kernel netlink buffer with messages that don't propagate to Bird (e.g. conntrackd). Sadly, it seems that the kernel netlink buffer is shared for all these messages. I don't still know whether this should be considered kernel bug or not or whether it is even true. Haven't looked into it so deep. 3) Something is terribly wrong with your machine, universe or anything. (Hopefully not anybody's case.) All of these cases could cause an undebuggable strange behaviour in several ways. By issuing this WARN message, we want to warn the admin that there is something strange happening but it probably isn't terribly wrong. There was a strange bug (hang) in Bird triggered by ENOBUFS on netlink socket with no message ready to be read from that socket. I never believed it could happen but it definitely happens. MQ
participants (4)
-
Israel G. Lugo -
Jan Matejka -
Michal -
Ondrej Zajicek