BGP Connection reset on fast timers

saksham saksham.manchanda at secure64.com
Fri Jun 15 16:16:48 CEST 2018


Thanks for the reply. I have been able to reproduce this on kvm too.

This is the output from running in debug mode. Highlighted part is where
we stop seeing RX's. I've added some additional debugs in the io loop
and running a simultaneous packet capture. I'll report back when I have
more.


Jun 13 15:06:06 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:06 localhost bird: BGP: kicking TX
Jun 13 15:06:06 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:06 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:07 localhost bird: BGP: Keepalive timer
Jun 13 15:06:07 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:07 localhost bird: BGP: kicking TX
Jun 13 15:06:07 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:07 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:08 localhost bird: BGP: Keepalive timer
Jun 13 15:06:08 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:08 localhost bird: BGP: kicking TX
Jun 13 15:06:08 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:08 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:09 localhost bird: BGP: Keepalive timer
Jun 13 15:06:09 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:09 localhost bird: BGP: kicking TX
Jun 13 15:06:09 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:09 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:10 localhost bird: BGP: Keepalive timer
Jun 13 15:06:10 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:10 localhost bird: BGP: kicking TX
Jun 13 15:06:10 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:10 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:11 localhost bird: BGP: Keepalive timer
Jun 13 15:06:11 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:11 localhost bird: BGP: kicking TX
Jun 13 15:06:11 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:11 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:12 localhost bird: BGP: Keepalive timer
Jun 13 15:06:12 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:12 localhost bird: BGP: kicking TX
Jun 13 15:06:12 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:12 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:13 localhost bird: BGP: Keepalive timer
Jun 13 15:06:13 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:13 localhost bird: BGP: kicking TX
Jun 13 15:06:14 localhost bird: BGP: Keepalive timer
Jun 13 15:06:14 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:14 localhost bird: BGP: kicking TX
Jun 13 15:06:15 localhost bird: BGP: Keepalive timer
Jun 13 15:06:15 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:15 localhost bird: BGP: kicking TX
Jun 13 15:06:15 localhost bird: BGP: Hold timeout
Jun 13 15:06:15 localhost bird: BGP: Scheduling packet type 3
Jun 13 15:06:15 localhost bird: BGP: Updating startup delay
Jun 13 15:06:15 localhost bird: bgp1: Error: Hold timer expired




On 06/14/2018 11:10 PM, dave seddon wrote:
> Packet 35 shows .13, which is the Bird running on Vmware (sorry about
> that), and clearly thinks the hold time expired:
>
> Major error Code: Hold Timer Expired (4)
> Minor error Code (Hold Timer Expired): 0
>
> Might be worth trying to run bird debugging to see what else it says.
> Have you consider BFD?
> Maybe try running different visualization (e.g. KVM), or no visualization.
>
> On Tue, Jun 12, 2018 at 3:42 AM, Olivier Benghozi
> <olivier.benghozi at wifirst.fr <mailto:olivier.benghozi at wifirst.fr>> wrote:
>
>     Just a comment:
>
>     here we use 5/15 on some 10GE links between
>     Redback/Ericsson/SmartEdge and Cisco routers (so, unrelated to
>     BIRD and Linux) with success (never flaps if the link is OK).
>     These links are used to receive/transmit L2TP tunnels traffic.
>
>     The usecase was:
>     1) there are some intermediate switches on the links (so a cut
>     cannot always be quickly detected)
>     2)  L2TP timers are aggressive and it's relevant to switch to
>     another path quickly enough in order to avoid some L2TP tunnels
>     disconnections, which in turn would disconnect several tens of
>     thousands PPP sessions and users
>     3) BFD wasn't an option (between two different operators)
>
>
>     Olivier
>
>>     Le 12 juin 2018 à 11:09, Maria Jan Matějka <jan.matejka at nic.cz
>>     <mailto:jan.matejka at nic.cz>> a écrit :
>>
>>     If I remember it correctly, there was somebody who used a 5/15
>>     setup and still had to take a lot of care to keep the links up.
>>     By the way, is there any good reason to have so short timeouts? 
>
>
>
>
> -- 
> Regards,
> Dave Seddon
> +1 415 310 4086
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20180615/5a19eb0a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4016 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20180615/5a19eb0a/attachment.p7s>


More information about the Bird-users mailing list