Shutting down BGP neighbor causes high CPU and many session flaps.
Hello, We use bird 1.4.2 as route server with multiple RIBs with ~100 BGP active sessions. Over one of these sessions, we're receiving ~ 360k prefixes and re-announcing them to all other sessions. By my calculations the total amount of all prefixes in all RIBs is about ~ 3600000 and till now everything was OK. *But today we have experienced the following issue: * When we stopped the session that we received ~360k, BIRD daemon went to 100% usage and held this behavior for period of~5-6 min. This event caused а lot of ( but not all ) sessions to start flapping. Here is the output taken from our log during the event: Sep 18 10:40:51 rs2 bird: R0_69: Received: Hold timer expired Sep 18 10:40:51 rs2 bird: R0_69: BGP session closed Sep 18 10:40:52 rs2 bird: R0_69: Down Sep 18 10:41:11 rs2 bird: R0_69: Startup delayed by 60 seconds Sep 18 10:41:45 rs2 bird: R0_69: Incoming connection from 10.0.0.69 (port 13073) rejected Sep 18 10:41:59 rs2 bird: R0_69: Started Sep 18 10:42:36 rs2 bird: R0_69: Incoming connection from 10.0.0.69 (port 24222) accepted Sep 18 10:42:36 rs2 bird: R0_69: BGP session established The above lines was repeated for all other affected sessions. After the CPU peak of 5 min, all affected sessions became UP again. Our CPU arch is 2 x Intel(R) Xeon(R) CPU E5310 1.6Ghz with 4 cores. I know BIRD that still may use only 1 CPU core but is there any plans to use more? We will highly appreciate for any suggestions or share some experience how to prevent in future such event. Thanks in advance! Best~ -- --- Find out about our new Cloud service - Cloudware.bg <http://cloudware.bg/?utm_source=email&utm_medium=signature&utm_content=link&utm_campaign=newwebsite> Access anywhere. Manage it yourself. Pay as you go. ------------------------------------------------------------------------ *Javor Kliachev* IP Engineer Neterra Ltd. Telephone: +359 2 975 16 16 Fax: +359 2 975 34 36 www.neterra.net <http://www.neterra.net>
On Thu, Sep 18, 2014 at 03:27:21PM +0300, Javor Kliachev wrote:
Hello,
We use bird 1.4.2 as route server with multiple RIBs with ~100 BGP active sessions. Over one of these sessions, we're receiving ~ 360k prefixes and re-announcing them to all other sessions.
By my calculations the total amount of all prefixes in all RIBs is about ~ 3600000 and till now everything was OK.
*But today we have experienced the following issue: *
When we stopped the session that we received ~360k, BIRD daemon went to 100% usage and held this behavior for period of~5-6 min. This event caused а lot of ( but not all ) sessions to start flapping.
Here is the output taken from our log during the event:
Sep 18 10:40:51 rs2 bird: R0_69: Received: Hold timer expired Sep 18 10:40:51 rs2 bird: R0_69: BGP session closed Sep 18 10:40:52 rs2 bird: R0_69: Down Sep 18 10:41:11 rs2 bird: R0_69: Startup delayed by 60 seconds Sep 18 10:41:45 rs2 bird: R0_69: Incoming connection from 10.0.0.69 (port 13073) rejected Sep 18 10:41:59 rs2 bird: R0_69: Started Sep 18 10:42:36 rs2 bird: R0_69: Incoming connection from 10.0.0.69 (port 24222) accepted Sep 18 10:42:36 rs2 bird: R0_69: BGP session established
The above lines was repeated for all other affected sessions.
Hello Thanks for the bugreport. Could you send me the config file, the whole log and information when exactly BIRD went to 100% and then back? Even in permanent 100% CPU load, BIRD shouldn't miss timers for sending keepalive packets. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Javor Kliachev -
Ondrej Zajicek