Linux 4.20 / incomplete ipv4 addr list / RTM_GETADDR
Hello bird users, Today I tried to upgrade one of platform to from linux 4.19 to linux 4.20. I started noticing the bgp protocol could not start after this message: <TRACE> uplink: Waiting for 1.2.3.4 to become my neighbor The neighbor (directly connected on an interface) would never appear to bird (as in show interfaces) because the AF_INET/RTM_GETADDR reply from the kernel looks incomplete. It works fine in v6. The results from kernel are capped to 104 ip address (and we run servers with >700 ip address) on loopback. I confirm this behavour with nlmon / tcpdump / wireshark. Tested with bird v2.0.2 and v1.6.3 (debian stable). I haven't reproduced the behavior elsewhere than bird, but I believe this could prove useful to someone else. -- \o/ Arthur G Gandi.net
On Thu, Dec 27, 2018 at 05:01:52PM +0000, Arthur Gautier wrote:
Hello bird users,
Today I tried to upgrade one of platform to from linux 4.19 to linux 4.20. I started noticing the bgp protocol could not start after this message: <TRACE> uplink: Waiting for 1.2.3.4 to become my neighbor
The neighbor (directly connected on an interface) would never appear to bird (as in show interfaces) because the AF_INET/RTM_GETADDR reply from the kernel looks incomplete. It works fine in v6.
The results from kernel are capped to 104 ip address (and we run servers with >700 ip address) on loopback.
Hello Aren't there any error messages in log? I would guess it is truncated message due to some more information pushed from kernel or putting multiple information into one messsage. Does it help if the constant NL_RX_SIZE (in sysdep/linux/netlink.c) is increased?
I confirm this behavour with nlmon / tcpdump / wireshark. Tested with bird v2.0.2 and v1.6.3 (debian stable).
Is capped the result from kernel, or its parsing in BIRD? But perhaps limited buffer size would cause kernel to limit output of data it sends. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Fri, Dec 28, 2018 at 07:52:59PM +0100, Ondrej Zajicek wrote:
On Thu, Dec 27, 2018 at 05:01:52PM +0000, Arthur Gautier wrote:
Hello bird users,
Today I tried to upgrade one of platform to from linux 4.19 to linux 4.20. I started noticing the bgp protocol could not start after this message: <TRACE> uplink: Waiting for 1.2.3.4 to become my neighbor
The neighbor (directly connected on an interface) would never appear to bird (as in show interfaces) because the AF_INET/RTM_GETADDR reply from the kernel looks incomplete. It works fine in v6.
The results from kernel are capped to 104 ip address (and we run servers with >700 ip address) on loopback.
Hello
Aren't there any error messages in log? I would guess it is truncated message due to some more information pushed from kernel or putting multiple information into one messsage.
Does it help if the constant NL_RX_SIZE (in sysdep/linux/netlink.c) is increased?
I confirm this behavour with nlmon / tcpdump / wireshark. Tested with bird v2.0.2 and v1.6.3 (debian stable).
Is capped the result from kernel, or its parsing in BIRD? But perhaps limited buffer size would cause kernel to limit output of data it sends.
The result from the kernel is capped. I extracted the bird code to reproduce the issue in a minimal environment. I also have been able to bisected the issue I have down to d7e38611b81e6d7e14969c361f2b9fc07403a6c3 https://github.com/torvalds/linux/commit/d7e38611b81e6d7e14969c361f2b9fc0740... I am not yet capable to understand why this is happening. Code looks very similar to what ipv6 is doing but does not work the same for v4. I am still working on getting a patch for this. -- \o/ Arthur G Gandi.net
On Sat, Dec 29, 2018 at 05:54:58AM +0000, Arthur Gautier wrote:
On Fri, Dec 28, 2018 at 07:52:59PM +0100, Ondrej Zajicek wrote:
On Thu, Dec 27, 2018 at 05:01:52PM +0000, Arthur Gautier wrote:
Hello bird users,
Today I tried to upgrade one of platform to from linux 4.19 to linux 4.20. I started noticing the bgp protocol could not start after this message: <TRACE> uplink: Waiting for 1.2.3.4 to become my neighbor
The neighbor (directly connected on an interface) would never appear to bird (as in show interfaces) because the AF_INET/RTM_GETADDR reply from the kernel looks incomplete. It works fine in v6.
The results from kernel are capped to 104 ip address (and we run servers with >700 ip address) on loopback.
Hello
Aren't there any error messages in log? I would guess it is truncated message due to some more information pushed from kernel or putting multiple information into one messsage.
Does it help if the constant NL_RX_SIZE (in sysdep/linux/netlink.c) is increased?
I confirm this behavour with nlmon / tcpdump / wireshark. Tested with bird v2.0.2 and v1.6.3 (debian stable).
Is capped the result from kernel, or its parsing in BIRD? But perhaps limited buffer size would cause kernel to limit output of data it sends.
The result from the kernel is capped. I extracted the bird code to reproduce the issue in a minimal environment. I also have been able to bisected the issue I have down to d7e38611b81e6d7e14969c361f2b9fc07403a6c3 https://github.com/torvalds/linux/commit/d7e38611b81e6d7e14969c361f2b9fc0740...
I am not yet capable to understand why this is happening. Code looks very similar to what ipv6 is doing but does not work the same for v4.
I am still working on getting a patch for this. ~
FYI, here is a patch on linux that fixes my issue: https://patchwork.ozlabs.org/patch/1019459/ This patch being my first I contribute to linux, I'm quite inexperienced with the process. Hopefuly I did it correctly (and the patch is correct!). -- \o/ Arthur G Gandi.net
On Sun, Dec 30, 2018 at 06:24:04PM +0000, Arthur Gautier wrote:
On Sat, Dec 29, 2018 at 05:54:58AM +0000, Arthur Gautier wrote:
On Fri, Dec 28, 2018 at 07:52:59PM +0100, Ondrej Zajicek wrote:
On Thu, Dec 27, 2018 at 05:01:52PM +0000, Arthur Gautier wrote:
Hello bird users,
Today I tried to upgrade one of platform to from linux 4.19 to linux 4.20. I started noticing the bgp protocol could not start after this message: <TRACE> uplink: Waiting for 1.2.3.4 to become my neighbor
The neighbor (directly connected on an interface) would never appear to bird (as in show interfaces) because the AF_INET/RTM_GETADDR reply from the kernel looks incomplete. It works fine in v6.
The results from kernel are capped to 104 ip address (and we run servers with >700 ip address) on loopback.
Hello
Aren't there any error messages in log? I would guess it is truncated message due to some more information pushed from kernel or putting multiple information into one messsage.
Does it help if the constant NL_RX_SIZE (in sysdep/linux/netlink.c) is increased?
I confirm this behavour with nlmon / tcpdump / wireshark. Tested with bird v2.0.2 and v1.6.3 (debian stable).
Is capped the result from kernel, or its parsing in BIRD? But perhaps limited buffer size would cause kernel to limit output of data it sends.
The result from the kernel is capped. I extracted the bird code to reproduce the issue in a minimal environment. I also have been able to bisected the issue I have down to d7e38611b81e6d7e14969c361f2b9fc07403a6c3 https://github.com/torvalds/linux/commit/d7e38611b81e6d7e14969c361f2b9fc0740...
I am not yet capable to understand why this is happening. Code looks very similar to what ipv6 is doing but does not work the same for v4.
I am still working on getting a patch for this. ~
FYI, here is a patch on linux that fixes my issue: https://patchwork.ozlabs.org/patch/1019459/
This patch being my first I contribute to linux, I'm quite inexperienced with the process. Hopefuly I did it correctly (and the patch is correct!).
Just deployed v2 of the patch to a canary server in our production it works as expected. https://patchwork.ozlabs.org/patch/1019497/ -- \o/ Arthur G Gandi.net
participants (2)
-
Arthur Gautier -
Ondrej Zajicek