BGP keepalive packets delayed during reconfigure

Maria Matejka maria.matejka at nic.cz
Fri Mar 5 14:59:43 CET 2021


Hello!

This is simply a result of a large configuration file. BIRD took more 
than 16 minutes to parse and apply new config. As BIRD is 
single-threaded, nothing is done while parsing the new config.

This is going to be fixed by using more CPU threads. Some preview of a 
version doing exactly this with configuration may be found in GIT in 
commit c78f88c91c7b75f530d70d503fa011c20f16f278.

Use with caution, we have not finished all tests yet.

Maria

On 3/5/21 2:41 PM, Damian Zaremba wrote:
> Hi,
> 
> I recently observed a case where bird 2.0.7 stopped emitting keepalive 
> packets to configured neighbors during a protocol reconfigure.
> 
> To replicate this condition I have constructed a single BGP session that 
> is fed a large number of filtered routes, the configuration structure 
> resembles;
> 
> protocol static -> table source -> protocol pipe [import filters] -> 
> table neighbor -> protocol bgp
> 
> 
> Changing a route attribute in the source table and calling `reconfigure` 
> results in the following observations;
> 
> 1. Bird logs on the sender side show a slow reconfigure;
> 
> 2021-03-05 10:28:19.872 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:28:34.886 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:28:53.772 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:28:56.259 <INFO> Reconfiguring
> 2021-03-05 10:45:13.606 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:45:13.611 <WARN> I/O loop cycle took 966448 ms for 1016 
> events
> 2021-03-05 10:45:13.612 <INFO> Reconfigured
> 2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
> 2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
> 2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
> 2021-03-05 10:45:13.612 <RMT> bgp1: Received: Hold timer expired
> 2021-03-05 10:46:06.106 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:46:24.541 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:46:42.282 <TRACE> bgp1: Sending KEEPALIVE
> 
> 
> 2. Packet capture shows the client receiving no keepalives and then 
> multiple close together;
> 
> $ cat bgp.pcap | pbgpp --filter-message-type=KEEPALIVE --formatter=LINE -
> 1614940133.773725    KEEPALIVE    None    None    None
> 1614940138.282074    KEEPALIVE    None    None    None
> 1614940154.308565    KEEPALIVE    None    None    None
> 1614940174.343514    KEEPALIVE    None    None    None
> 1614940190.374193    KEEPALIVE    None    None    None
> 1614941113.607580    KEEPALIVE    None    None    None
> 1614941166.106192    KEEPALIVE    None    None    None
> 1614941166.106316    KEEPALIVE    None    None    None
> 1614941183.134392    KEEPALIVE    None    None    None
> 1614941184.541810    KEEPALIVE    None    None    None
> 1614941200.157543    KEEPALIVE    None    None    None
> 1614941202.282814    KEEPALIVE    None    None    None
> 1614941214.180433    KEEPALIVE    None    None    None
> 1614941217.732128    KEEPALIVE    None    None    None
> 1614941234.209164    KEEPALIVE    None    None    None
> 1614941236.416377    KEEPALIVE    None    None    None
> 1614941250.233110    KEEPALIVE    None    None    None
> 1614941253.668823    KEEPALIVE    None    None    None
> 1614941268.260864    KEEPALIVE    None    None    None
> 1614941269.668392    KEEPALIVE    None    None    None
> 1614941285.289222    KEEPALIVE    None    None    None
> 1614941287.120920    KEEPALIVE    None    None    None
> 1614941301.314094    KEEPALIVE    None    None    None
> 1614941302.270840    KEEPALIVE    None    None    None
> 1614941319.342323    KEEPALIVE    None    None    None
> 1614941321.92028    KEEPALIVE    None    None    None
> 1614941335.366858    KEEPALIVE    None    None    None
> 1614941337.417243    KEEPALIVE    None    None    None
> 1614941352.392059    KEEPALIVE    None    None    None
> 1614941356.43815    KEEPALIVE    None    None    None
> 1614941369.420711    KEEPALIVE    None    None    None
> 1614941373.705122    KEEPALIVE    None    None    None
> 1614941386.445406    KEEPALIVE    None    None    None
> 1614941391.885447    KEEPALIVE    None    None    None
> 1614941403.473187    KEEPALIVE    None    None    None
> 1614941407.172799    KEEPALIVE    None    None    None
> 1614941419.498776    KEEPALIVE    None    None    None
> 
> The 6th packet is 15min after the 5th packet then they spread back out 
> to the intervals seen in the steady state.
> 
> 3. Client trips off on timers until we get to a converged state on the 
> sender;
> 
> 2021-03-05 10:29:53 <RMT> bgp1: Error: Hold timer expired
> 2021-03-05 10:34:05 <RMT> bgp1: Error: Hold timer expired
> 2021-03-05 10:39:38 <RMT> bgp1: Error: Hold timer expired
> 2021-03-05 10:43:48 <RMT> bgp1: Error: Hold timer expired
> 
> I suspect this is down to the scheduling of BGP packets more generally 
> as we also observed the case where a route addition into a bgp protocol 
> (as reported by trace) was not seen as an UPDATE by the client for 
> multiple minutes during a reconfigure.
> 
> - Damian
> 


More information about the Bird-users mailing list