On Mon, May 30, 2011 at 09:23:12AM +0200, Arjan Filius wrote:
Hello,
upgraded from bird 1.2.4 to bird 1.3.1 last weekend, but needed to revert due to complaints from high bgp traffic to bgpmon.net and 100% cpu load on a 2x bird bgp routers setup connected with iBGP.
Did some research, and noticed a high bgp traffic situation on the iBGP sessions (about 10.000 bgp packets in 1.7 second)
Based on graph statistics, i noticed a irregular traffic on the trans-ip's as well (noticed after rollback in statistical graphs)
In the logging (mostly on the first bgp router) , i noticed quite some messages (33414 in ~12h) like: 29-05-2011 10:57:49 <WARN> Next hop address X.X.X.242 resolvable through recursive route for X.X.X.0/24 29-05-2011 10:57:49 <WARN> Next hop address Y.Y.35.194 resolvable through recursive route for Y.Y.32.0/19 29-05-2011 10:57:52 <WARN> Next hop address X.X.X.241 resolvable through recursive route for X.X.X.0/24 29-05-2011 10:57:52 <WARN> Next hop address X.X.X.242 resolvable through recursive route for X.X.X.0/24 29-05-2011 10:57:55 <WARN> Next hop address Y.Y.35.194 resolvable through recursive route for Y.Y.32.0/19 29-05-2011 10:57:55 <WARN> Next hop address X.X.X.241 resolvable through recursive route for X.X.X.0/24 29-05-2011 10:57:55 <WARN> Next hop address X.X.X.242 resolvable through recursive route for X.X.X.0/24
These are the subnets of the upstream trans-ip links of it's neigbor router
On the second (neighbor) bgp router, noticed only such 15 recursive messages
I had those upstream trans-ip interface IP's manually added once to the bird.conf with a "protocol static" definition with a subnet of /32. I thought this might be the problem, and wel, disabled those static trans-ip interface routes, except i forgot one. Noticed a lower CPU afterwards, but problem persisted, and i reverted to 1.2.4. Later i noticed i missed one static route to disable, but no time left to upgrade and test again.
Generally, the IP address from the BGP NEXT_HOP attribute (i.e. the ones mentioned in 'Next hop address X.X.X.242 resolvable ..' messages should be resolvable through a route which is not from BGP. So, if you do not use OSPF, adding static /32 routes for these IPs should do the trick. I don't know why that caused problems in your setting (Perhaps the routes have different NEXT_HOP attribute?). Other posibilities (like using 'gateway direct' to switch to the old behavior) were discussed here: http://marc.info/?l=bird-users&m=130173865608861&w=2
I created a 10.000 packet tcpdump capture (full size) of an iBGP session, in case somone would like to look at it.
For such kind of problem tcpdump is not really useful, more useful would be BIRD log with 'debug all all'; -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."