When a BGP sessions comes up/down, there is a period of high CPU while all the new best-routes are computed -- this is understandable.

However, on boxes I have where this table is then promoted to the Linux kernel, the CPU usage stays high for some considerable time. For a full (~600,000 prefix) 

From the looks of things, a bunch of netlink messages are being generated to the kernel to RTM_DELROUTE then RTM_NEWROUTE -- each of which is causing the kernel's trie to rebalance which is a fairly costly operation.

I was wondering if anyone had experimented with anything such as implementing either:

1. a corking mechanism (i.e. stop balancing the trie until a signal is sent to uncork)

2. fib_trie garbage collection (i.e. only rebalance the trie once per time interval)

3. "double buffering" (i.e. for a given operation such as protocol flap, memcpy the trie, perform operations, then update the root node pointer to the new optimised trie)

Any and all of these ideas may be horrific, I'm just interested whether anyone's running full tables in linux, filled by (e.g.) BIRD, and have encountered this issue.

Unfortunately there doesn't appear to be an RTM_CHANGE or similar in Linux, so the DELROUTE will seemingly cause a tree to either be pruned or re-branched, followed by the NEWROUTE causing a full rebalance run -- whereas a CHANGE would (could) hopefully just over-write the value.

Many thanks in advance,

Matthew Walster