For context on my end; this issue was experienced on physical hardware (64bit) with Intel 1Gbit NICs (no offloading). We only noticed this after some length of time, (> 180 days) during which we likely had < 40 BGP session flaps on our end via Bird. optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence of struct cmsghdr structures with appended data. The default size is 10240 bytes. According to Eric Dumazet back in 2012 [1]: <snip> There is no limit on number of MD5 keys an application can attach to a tcp socket. This patch adds a per tcp socket limit based on /proc/sys/net/core/optmem_max With current default optmem_max values, this allows about 150 keys on 64bit arches, and 88 keys on 32bit arches. </snip> Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP session somehow? -Mike [1] https://patchwork.ozlabs.org/patch/138861/ On Tue, 25 Aug 2015 15:48:44 -0400 Brian Rak <brak@gameservers.com> wrote:
I haven't tried the optmem_max option, but I did some more experimenting..
We have a virtual machine running a nearly identical BIRD config that's not showing this issue.
The machine with the issue is physical, and has a Mellanox ConnectX NIC. I'm wondering if there's some limitation with TCP offload there that's responsible. Disabling TCP offload didn't seem to help though.
On 8/24/2015 4:59 PM, Michael Vallaly wrote:
I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels.. (Re: Strange MD5 Auth problem in BIRD 1.3.8)
AFAIK it was related to kernel socket option memory (or lack there of) and I can only surmise it was related to some sort of memory leak. Ondrej Zajicek seemed to think this was an issue in the kernel itself, but I wasn't able to prove that definitively.
I was able to work around it (without rebooting) by:
<snip> echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480 </snip>
Which seemed to have deferred the issue, long enough for us to reboot / not run into it constantly.
If anyone else has any details or info, I would still be interested in the root-cause analysis and hopefully permanent fix.
-Mike
On Mon, 24 Aug 2015 15:59:06 -0400 Brian Rak <brak@gameservers.com> wrote:
I have a machine running BIRD 1.4.5, and I'm seeing a lot of these messages when I start it up:
2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot allocate memory 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot allocate memory
It also seems like the sessions that report that error do not come up, and show a status of 'Error: Kernel MD5 auth failed'.
I'm only trying to configure around 200 BGP sessions here, most of which are advertising a very small number of prefixes.
I don't really see any tunable settings here, any suggestions as to how I can correct this?
-- Michael Vallaly <mvallaly@nolatency.com>