Re: Optimizing large BGP -> Linux kernel additions/removals

18 Aug 2015

      On Mon, Aug 17, 2015 at 03:41:00PM -0700, Matthew Walster wrote:
...
When a BGP sessions comes up/down, there is a period of high CPU while all
the new best-routes are computed -- this is understandable.
However, on boxes I have where this table is then promoted to the Linux
kernel, the CPU usage stays high for some considerable time. For a full
(~600,000 prefix)
...
From the looks of things, a bunch of netlink messages are being generated
to the kernel to RTM_DELROUTE then RTM_NEWROUTE -- each of which is causing
the kernel's trie to rebalance which is a fairly costly operation.
...
Unfortunately there doesn't appear to be an RTM_CHANGE or similar in Linux,
so the DELROUTE will seemingly cause a tree to either be pruned or
re-branched, followed by the NEWROUTE causing a full rebalance run --
whereas a CHANGE would (could) hopefully just over-write the value.
Hi

I don't know much of kernel internal code, so i cannot answer the further
questions, although it seems silly to me to not have some kind of amortized
behavior for trie rebalancing.

Note that there is NLM_F_REPLACE flag that with RTM_NEWROUTE works like
RTM_CHANGE. And BIRD used that in some historic versions. But there is a
semantic problem with these replacements - route options works like
selector for RTM_DELROUTE, but not for replacement (where they just
represent new values).

Namely BIRD can specify rtm_protocol RTPROT_BIRD for RTM_DELROUTE and
only BIRD routes are affected (and RTM_NEWROUTE would succeed only if the
RTM_DELROUTE succeded), while for replacement specifying rtm_protocol
RTPROT_BIRD just means that the current route (regardless its
rtm_protocol) is replaced with the new one with that rtm_protocol value,
which is undesirable.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."

Re: Optimizing large BGP -> Linux kernel additions/removals

Ondrej Zajicek