BGP Keepalive timer wedging

Chris Caputo ccaputo at alt.net
Wed Aug 20 17:00:04 CEST 2014


On Wed, 20 Aug 2014, Ondrej Zajicek wrote:
> On Wed, Aug 20, 2014 at 01:02:27AM +0000, Chris Caputo wrote:
> > At the Seattle IX we are using BIRD 1.4.4 for our native (non-VM) route 
> > servers.
> > 
> > With one particular IPv4 peer, on two different route servers, I am seeing 
> > "Keepalive timer" count down to zero and then becoming wedged/stalled.  
> > Tcpdump fails to show a keepalive message being sent, while it does show 
> > them being received from the peer.
> ...
> > with Hold timer getting updated over time, but the Keepalive timer doesn't 
> > change after it has its initial countdown to zero.  The peer eventually 
> > signals "ex: Received: Hold timer expired" once it goes 180 seconds 
> > without a BGP update, since it also hasn't gotten any keepalive messages.
> > 
> > I've looked at the code and haven't found a problem.  The other 64 
> > similarly configured peers on the route server are working fine.
> > 
> > Has anyone seen this or have any suggestions?
> 
> Hi
> 
> I would guess that the problem is in the TCP connection to the peer - BGP
> packets are sent, not acknowledged, TX queue became full and TX hook is
> not called anymore (Keepalive timer is restarted in TX hook when
> previously scheduled Keepalive is sent). You should check whether other
> packets are propagated (e.g. updates from both sides), esp. when the
> connection is already in keepalive 0/60 state.

Ondrej,

You are correct:

  Proto Recv-Q Send-Q Local Address           Foreign Address         State
  tcp        0  42340 206.81.80.2:179         206.81.80.xx:35237      ESTABLISHED

I should have caught that.

Thank you,
Chris



More information about the Bird-users mailing list