Bird6 freeze under high load

Baptiste Jonglez baptiste at bitsofnetworks.org
Thu Feb 5 14:50:08 CET 2015


On Sat, Jan 31, 2015 at 03:06:37PM +0100, Ondrej Zajicek wrote:
> On Sat, Jan 31, 2015 at 02:47:51PM +0100, Baptiste Jonglez wrote:
> > This took several minutes to complete, and there certainly isn't so much
> > IPv6 routes in the kernel: routes appear several times in the output of
> > "ip -6 r".  Running this command multiple times yields very different
> > results each time.
> > 
> > Thus, I don't think the bug is in Bird.  Could it be some kind of race
> > condition with netlink?  I haven't been able to find any reference to this
> > bug, either in the kernel or in iproute2.  For reference, this is on a
> > Debian wheezy system, but I can reproduce the duplicate routes in "ip -6 r"
> > on Debian jessie as well.
> 
> Interesting, what are the kernel versions in these Wheezy and Jessie systems?
> Does the problem (with 'ip -6 r') appears also when BIRD is not running?

So far, I saw this behaviour in Wheezy (3.2.0-4-amd64 + iproute
20120521-3+b3), and Jessie (3.14.4-1 + iproute2 3.16.0-2).  In both cases,
Bird was running.

I tried shutting down bird6 (with "persist" in the kernel protocol, so
that routes stay in the kernel), and the problem seemed to persist:

$ ip -6 r | wc -l
26720
$ ip -6 r | wc -l
21794
$ ip -6 r | wc -l
50321
$ ip -6 r | wc -l
37602
$ ip -6 r | wc -l
59011

However, the amplitude is very reduced (when Bird is running and talking
to the kernel, doing the same thing yields millions of routes).

> I wonder what factors are specific to this problem. I remember there were
> a similar report or two few years ago, but these reports are too uncommon
> to be an universal problem in IPv6 Linux forwarding.

I actually found a way to reproduce this without Bird, just by inserting
static routes.  It seems that dumping the routing table while some routes
are being inserted is enough to trigger the bug.

Simple way to see the issue: run this on one shell

  watch -n 0.2 "ip -6 r | wc -l"

and run this in another, root, shell:

  for i in {0000..3999};  do ip -6 r add unreachable 2001:db8:$i::/48 proto 57;  sleep 0.005;  done;  sleep 30;  ip -6 r flush proto 57

Then look at the output of watch: at first, the number of routes grows
regularly, and then, at some point, it will start jumping up and down.

It should be possible to write a small program that automatically tests
the presence of the bug (for instance by checking that the number of
routes reported by "ip -6 r" is always increasing).


So far, using the little shell script above, I saw the issue with:

- Wheezy (3.2.0-4-amd64 + iproute 20120521-3+b3)

The bug didn't show up with:

- Wheezy with a multipath TCP kernel (3.14.27 + iproute 201407242100)
- Archlinux (3.18.2 + iproute2 3.17.0 )
- Archlinux (3.14.27-1-mptcp + iproute2 3.18.0-1)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20150205/ecd37356/attachment.asc>


More information about the Bird-users mailing list