I have a machine running BIRD 1.4.5, and I'm seeing a lot of these messages when I start it up: 2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot allocate memory 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot allocate memory It also seems like the sessions that report that error do not come up, and show a status of 'Error: Kernel MD5 auth failed'. I'm only trying to configure around 200 BGP sessions here, most of which are advertising a very small number of prefixes. I don't really see any tunable settings here, any suggestions as to how I can correct this?
I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels.. (Re: Strange MD5 Auth problem in BIRD 1.3.8) AFAIK it was related to kernel socket option memory (or lack there of) and I can only surmise it was related to some sort of memory leak. Ondrej Zajicek seemed to think this was an issue in the kernel itself, but I wasn't able to prove that definitively. I was able to work around it (without rebooting) by: <snip> echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480 </snip> Which seemed to have deferred the issue, long enough for us to reboot / not run into it constantly. If anyone else has any details or info, I would still be interested in the root-cause analysis and hopefully permanent fix. -Mike On Mon, 24 Aug 2015 15:59:06 -0400 Brian Rak <brak@gameservers.com> wrote:
I have a machine running BIRD 1.4.5, and I'm seeing a lot of these messages when I start it up:
2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot allocate memory 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot allocate memory
It also seems like the sessions that report that error do not come up, and show a status of 'Error: Kernel MD5 auth failed'.
I'm only trying to configure around 200 BGP sessions here, most of which are advertising a very small number of prefixes.
I don't really see any tunable settings here, any suggestions as to how I can correct this?
-- Michael Vallaly <mvallaly@nolatency.com>
I haven't tried the optmem_max option, but I did some more experimenting.. We have a virtual machine running a nearly identical BIRD config that's not showing this issue. The machine with the issue is physical, and has a Mellanox ConnectX NIC. I'm wondering if there's some limitation with TCP offload there that's responsible. Disabling TCP offload didn't seem to help though. On 8/24/2015 4:59 PM, Michael Vallaly wrote:
I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels.. (Re: Strange MD5 Auth problem in BIRD 1.3.8)
AFAIK it was related to kernel socket option memory (or lack there of) and I can only surmise it was related to some sort of memory leak. Ondrej Zajicek seemed to think this was an issue in the kernel itself, but I wasn't able to prove that definitively.
I was able to work around it (without rebooting) by:
<snip> echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480 </snip>
Which seemed to have deferred the issue, long enough for us to reboot / not run into it constantly.
If anyone else has any details or info, I would still be interested in the root-cause analysis and hopefully permanent fix.
-Mike
On Mon, 24 Aug 2015 15:59:06 -0400 Brian Rak <brak@gameservers.com> wrote:
I have a machine running BIRD 1.4.5, and I'm seeing a lot of these messages when I start it up:
2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot allocate memory 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot allocate memory
It also seems like the sessions that report that error do not come up, and show a status of 'Error: Kernel MD5 auth failed'.
I'm only trying to configure around 200 BGP sessions here, most of which are advertising a very small number of prefixes.
I don't really see any tunable settings here, any suggestions as to how I can correct this?
For context on my end; this issue was experienced on physical hardware (64bit) with Intel 1Gbit NICs (no offloading). We only noticed this after some length of time, (> 180 days) during which we likely had < 40 BGP session flaps on our end via Bird. optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence of struct cmsghdr structures with appended data. The default size is 10240 bytes. According to Eric Dumazet back in 2012 [1]: <snip> There is no limit on number of MD5 keys an application can attach to a tcp socket. This patch adds a per tcp socket limit based on /proc/sys/net/core/optmem_max With current default optmem_max values, this allows about 150 keys on 64bit arches, and 88 keys on 32bit arches. </snip> Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP session somehow? -Mike [1] https://patchwork.ozlabs.org/patch/138861/ On Tue, 25 Aug 2015 15:48:44 -0400 Brian Rak <brak@gameservers.com> wrote:
I haven't tried the optmem_max option, but I did some more experimenting..
We have a virtual machine running a nearly identical BIRD config that's not showing this issue.
The machine with the issue is physical, and has a Mellanox ConnectX NIC. I'm wondering if there's some limitation with TCP offload there that's responsible. Disabling TCP offload didn't seem to help though.
On 8/24/2015 4:59 PM, Michael Vallaly wrote:
I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels.. (Re: Strange MD5 Auth problem in BIRD 1.3.8)
AFAIK it was related to kernel socket option memory (or lack there of) and I can only surmise it was related to some sort of memory leak. Ondrej Zajicek seemed to think this was an issue in the kernel itself, but I wasn't able to prove that definitively.
I was able to work around it (without rebooting) by:
<snip> echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480 </snip>
Which seemed to have deferred the issue, long enough for us to reboot / not run into it constantly.
If anyone else has any details or info, I would still be interested in the root-cause analysis and hopefully permanent fix.
-Mike
On Mon, 24 Aug 2015 15:59:06 -0400 Brian Rak <brak@gameservers.com> wrote:
I have a machine running BIRD 1.4.5, and I'm seeing a lot of these messages when I start it up:
2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot allocate memory 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot allocate memory
It also seems like the sessions that report that error do not come up, and show a status of 'Error: Kernel MD5 auth failed'.
I'm only trying to configure around 200 BGP sessions here, most of which are advertising a very small number of prefixes.
I don't really see any tunable settings here, any suggestions as to how I can correct this?
-- Michael Vallaly <mvallaly@nolatency.com>
With current default optmem_max values, this allows about 150 keys on 64bit arches, and 88 keys on 32bit arches.
I think you just solved it for me at least... I was seeing issues after ~147 BGP sessions, so running into this limit would make perfect sense. I could reproduce it at will just by restarting BIRD, so my issue at least wasn't any sort of leak. On 8/25/2015 4:45 PM, Michael Vallaly wrote:
For context on my end; this issue was experienced on physical hardware (64bit) with Intel 1Gbit NICs (no offloading).
We only noticed this after some length of time, (> 180 days) during which we likely had < 40 BGP session flaps on our end via Bird.
optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence of struct cmsghdr structures with appended data. The default size is 10240 bytes.
According to Eric Dumazet back in 2012 [1]:
<snip> There is no limit on number of MD5 keys an application can attach to a tcp socket.
This patch adds a per tcp socket limit based on /proc/sys/net/core/optmem_max
With current default optmem_max values, this allows about 150 keys on 64bit arches, and 88 keys on 32bit arches. </snip>
Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP session somehow?
-Mike
[1] https://patchwork.ozlabs.org/patch/138861/
On Tue, 25 Aug 2015 15:48:44 -0400 Brian Rak <brak@gameservers.com> wrote:
I haven't tried the optmem_max option, but I did some more experimenting..
We have a virtual machine running a nearly identical BIRD config that's not showing this issue.
The machine with the issue is physical, and has a Mellanox ConnectX NIC. I'm wondering if there's some limitation with TCP offload there that's responsible. Disabling TCP offload didn't seem to help though.
On 8/24/2015 4:59 PM, Michael Vallaly wrote:
I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels.. (Re: Strange MD5 Auth problem in BIRD 1.3.8)
AFAIK it was related to kernel socket option memory (or lack there of) and I can only surmise it was related to some sort of memory leak. Ondrej Zajicek seemed to think this was an issue in the kernel itself, but I wasn't able to prove that definitively.
I was able to work around it (without rebooting) by:
<snip> echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480 </snip>
Which seemed to have deferred the issue, long enough for us to reboot / not run into it constantly.
If anyone else has any details or info, I would still be interested in the root-cause analysis and hopefully permanent fix.
-Mike
On Mon, 24 Aug 2015 15:59:06 -0400 Brian Rak <brak@gameservers.com> wrote:
I have a machine running BIRD 1.4.5, and I'm seeing a lot of these messages when I start it up:
2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot allocate memory 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot allocate memory
It also seems like the sessions that report that error do not come up, and show a status of 'Error: Kernel MD5 auth failed'.
I'm only trying to configure around 200 BGP sessions here, most of which are advertising a very small number of prefixes.
I don't really see any tunable settings here, any suggestions as to how I can correct this?
On Tue, Aug 25, 2015 at 03:45:47PM -0500, Michael Vallaly wrote:
For context on my end; this issue was experienced on physical hardware (64bit) with Intel 1Gbit NICs (no offloading).
We only noticed this after some length of time, (> 180 days) during which we likely had < 40 BGP session flaps on our end via Bird.
optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence of struct cmsghdr structures with appended data. The default size is 10240 bytes.
According to Eric Dumazet back in 2012 [1]:
<snip> There is no limit on number of MD5 keys an application can attach to a tcp socket.
This patch adds a per tcp socket limit based on /proc/sys/net/core/optmem_max
With current default optmem_max values, this allows about 150 keys on 64bit arches, and 88 keys on 32bit arches. </snip>
Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP session somehow?
Thanks for the info about the limit. Note that the incoming (listening) TCP socket (AFAIK) has to be configured with all the keys, so it is possible to hit the limit during regular operation without any leaks. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek <santiago@crfreenet.org> schrieb am Di., 25. Aug. 2015 23:11: On Tue, Aug 25, 2015 at 03:45:47PM -0500, Michael Vallaly wrote:
For context on my end; this issue was experienced on physical hardware (64bit) with Intel 1Gbit NICs (no offloading).
We only noticed this after some length of time, (> 180 days) during which we likely had < 40 BGP session flaps on our end via Bird.
optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence of struct cmsghdr structures with appended data. The default size is 10240 bytes.
According to Eric Dumazet back in 2012 [1]:
<snip> There is no limit on number of MD5 keys an application can attach to a tcp socket.
This patch adds a per tcp socket limit based on /proc/sys/net/core/optmem_max
With current default optmem_max values, this allows about 150 keys on 64bit arches, and 88 keys on 32bit arches. </snip>
Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP session somehow?
Thanks for the info about the limit. Note that the incoming (listening) TCP socket (AFAIK) has to be configured with all the keys, so it is possible to hit the limit during regular operation without any leaks. Do I get it right?: Without adjusting the optmem_max value you won't be able to attach more than a 150 keys to a listening socket? Would this basically cause trouble when you want to run f.e. a bgp routeserver with more than 150 peers protecting there sessiibs via a unique key? Rgds, SJ
On Wed, Nov 18, 2015 at 06:05:31PM +0000, Stefan Jakob wrote:
Note that the incoming (listening) TCP socket (AFAIK) has to be configured with all the keys, so it is possible to hit the limit during regular operation without any leaks.
Do I get it right?:
Without adjusting the optmem_max value you won't be able to attach more than a 150 keys to a listening socket?
Would this basically cause trouble when you want to run f.e. a bgp routeserver with more than 150 peers protecting there sessiibs via a unique key?
AFAIK yes. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (4)
-
Brian Rak -
Michael Vallaly -
Ondrej Zajicek -
Stefan Jakob