Optimizing large BGP -> Linux kernel additions/removals

17 Aug 2015

      When a BGP sessions comes up/down, there is a period of high CPU while all
the new best-routes are computed -- this is understandable.

However, on boxes I have where this table is then promoted to the Linux
kernel, the CPU usage stays high for some considerable time. For a full
(~600,000 prefix)
...
From the looks of things, a bunch of netlink messages are being generated
to the kernel to RTM_DELROUTE then RTM_NEWROUTE -- each of which is causing
the kernel's trie to rebalance which is a fairly costly operation.
I was wondering if anyone had experimented with anything such as
implementing either:

1. a corking mechanism (i.e. stop balancing the trie until a signal is sent
to uncork)

2. fib_trie garbage collection (i.e. only rebalance the trie once per time
interval)

3. "double buffering" (i.e. for a given operation such as protocol flap,
memcpy the trie, perform operations, then update the root node pointer to
the new optimised trie)

Any and all of these ideas may be horrific, I'm just interested whether
anyone's running full tables in linux, filled by (e.g.) BIRD, and have
encountered this issue.

Unfortunately there doesn't appear to be an RTM_CHANGE or similar in Linux,
so the DELROUTE will seemingly cause a tree to either be pruned or
re-branched, followed by the NEWROUTE causing a full rebalance run --
whereas a CHANGE would (could) hopefully just over-write the value.

Many thanks in advance,

Matthew Walster

Matthew Walster

Ondrej Zajicek

tags

participants (2)