Thanks for the reply. I have been able to reproduce this on kvm
too.
This is the output from running in debug mode. Highlighted part
is where we stop seeing RX's. I've added some additional debugs in
the io loop and running a simultaneous packet capture. I'll report
back when I have more.
Jun 13 15:06:06 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:06 localhost bird: BGP: kicking TX
Jun 13 15:06:06 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:06 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:07 localhost bird: BGP: Keepalive timer
Jun 13 15:06:07 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:07 localhost bird: BGP: kicking TX
Jun 13 15:06:07 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:07 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:08 localhost bird: BGP: Keepalive timer
Jun 13 15:06:08 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:08 localhost bird: BGP: kicking TX
Jun 13 15:06:08 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:08 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:09 localhost bird: BGP: Keepalive timer
Jun 13 15:06:09 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:09 localhost bird: BGP: kicking TX
Jun 13 15:06:09 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:09 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:10 localhost bird: BGP: Keepalive timer
Jun 13 15:06:10 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:10 localhost bird: BGP: kicking TX
Jun 13 15:06:10 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:10 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:11 localhost bird: BGP: Keepalive timer
Jun 13 15:06:11 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:11 localhost bird: BGP: kicking TX
Jun 13 15:06:11 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:11 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:12 localhost bird: BGP:
Keepalive timer
Jun 13 15:06:12 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:12 localhost bird: BGP: kicking TX
Jun 13 15:06:12 localhost bird: BGP: RX hook: Got 19 bytes
Jun 13 15:06:12 localhost bird: BGP: Got packet 04 (19 bytes)
Jun 13 15:06:13 localhost bird: BGP: Keepalive timer
Jun 13 15:06:13 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:13 localhost bird: BGP: kicking TX
Jun 13 15:06:14 localhost bird: BGP: Keepalive timer
Jun 13 15:06:14 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:14 localhost bird: BGP: kicking TX
Jun 13 15:06:15 localhost bird: BGP: Keepalive timer
Jun 13 15:06:15 localhost bird: BGP: Scheduling packet type 4
Jun 13 15:06:15 localhost bird: BGP: kicking TX
Jun 13 15:06:15 localhost bird: BGP: Hold timeout
Jun 13 15:06:15 localhost bird: BGP: Scheduling packet type 3
Jun 13 15:06:15 localhost bird: BGP: Updating startup delay
Jun 13 15:06:15 localhost bird: bgp1: Error: Hold timer expired
Packet 35 shows .13, which is the Bird running on Vmware (sorry about that), and clearly thinks the hold time expired:
Major error Code: Hold Timer Expired (4)
Minor error Code (Hold Timer Expired): 0
Might be worth trying to run bird debugging to see what else it says.Have you consider BFD?Maybe try running different visualization (e.g. KVM), or no visualization.
On Tue, Jun 12, 2018 at 3:42 AM, Olivier Benghozi <olivier.benghozi@wifirst.fr> wrote:
Just a comment:
here we use 5/15 on some 10GE links between Redback/Ericsson/SmartEdge and Cisco routers (so, unrelated to BIRD and Linux) with success (never flaps if the link is OK). These links are used to receive/transmit L2TP tunnels traffic.
The usecase was:1) there are some intermediate switches on the links (so a cut cannot always be quickly detected)2) L2TP timers are aggressive and it's relevant to switch to another path quickly enough in order to avoid some L2TP tunnels disconnections, which in turn would disconnect several tens of thousands PPP sessions and users3) BFD wasn't an option (between two different operators)
Olivier
Le 12 juin 2018 à 11:09, Maria Jan Matějka <jan.matejka@nic.cz> a écrit :
If I remember it correctly, there was somebody who used a 5/15 setup and still had to take a lot of care to keep the links up.By the way, is there any good reason to have so short timeouts?
--
Regards,Dave Seddon
+1 415 310 4086
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________