Static protocol reconfiguration results in sub-optimal route updates
Hi, I recently observed a case where bird 2.0.7 emits sub-optimal route updates during a reconfiguration. Using the sample configuration of e.g. ipv4 table seed; ipv4 table announce; protocol static { route 203.0.113.0/24 unreachable; ipv4 { table seed; }; } protocol pipe { table announce; peer table seed; import filter { bgp_large_community.add((65000, 1, 1)); accept; }; } protocol bgp { debug all; ipv4 { table announce; export all; }; local 10.240.0.226 as 65000; neighbor 10.240.1.59 as 65000; } When the pipe configuration is updated to e.g. import filter { bgp_large_community.add((65000, 1, 2)); accept; }; Bird replaces the route into the BGP protocol; 2021-03-05 11:57:36.168 <INFO> Reconfiguring 2021-03-05 11:57:36.168 <INFO> Reloading channel pipe1.sec 2021-03-05 11:57:36.168 <TRACE> bgp1: Reconfigured 2021-03-05 11:57:36.168 <INFO> Reconfigured 2021-03-05 11:57:36.168 <TRACE> bgp1.ipv4 < replaced 203.0.113.0/24 unreachable 2021-03-05 11:57:36.168 <TRACE> bgp1: Sending UPDATE And emits a single update as expected; $ cat bgp.pcap | pbgpp --filter-message-type=UPDATE - [BGPMessage UPDATE] - 63 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:32871 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:04:50.255913 (1614945890.255913) | |- Update Message Sub-Type: ANNOUNCE |- Withdrawn Routes Length: 0 Bytes |- Total Path Attribute Length: 36 Bytes |- Prefix (NLRI): |--- 203.0.113.0/24 |- Path Attributes: |--- ORIGIN: IGP |- Path Attributes: |--- AS_PATH: |- Path Attributes: |--- NEXT_HOP: 10.240.0.226 |- Path Attributes: |--- LOCAL_PREF: |- Path Attributes: |--- LARGE_COMMUNITIES: 65000:1:2 However if we directly feed the table from a static protocol e.g. ipv4 table announce; protocol static { route 203.0.113.0/24 unreachable; ipv4 { table announce; import filter { bgp_large_community.add((65000, 1, 1)); accept; }; }; } protocol bgp { debug all; ipv4 { table announce; export all; }; local 10.240.0.226 as 65000; neighbor 10.240.1.59 as 65000; } The same filter update results in a protocol restart e.g. 2021-03-05 12:10:39.042 <INFO> Reconfiguring 2021-03-05 12:10:39.042 <INFO> Cannot reconfigure channel static1.ipv4 2021-03-05 12:10:39.042 <INFO> Restarting protocol static1 2021-03-05 12:10:39.042 <TRACE> bgp1: Reconfigured 2021-03-05 12:10:39.042 <TRACE> bgp1.ipv4 < removed 203.0.113.0/24 unreachable 2021-03-05 12:10:39.042 <TRACE> bgp1: Sending UPDATE 2021-03-05 12:10:39.042 <TRACE> bgp1.ipv4 < added 203.0.113.0/24 unreachable 2021-03-05 12:10:39.042 <INFO> Reconfigured 2021-03-05 12:10:39.042 <TRACE> bgp1: Sending UPDATE And our client receives 2 distinct UPDATE packets, the first being an explicit withdrawal, followed by an announcement; [BGPMessage UPDATE] - 27 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:48185 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:10:39.43871 (1614946239.43871) | |- Update Message Sub-Type: WITHDRAWAL |- Withdrawn Routes Length: 4 Bytes |- Total Path Attribute Length: 0 Bytes |- Withdrawn Routes: |--- 203.0.113.0/24 [BGPMessage UPDATE] - 63 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:48185 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:10:39.44893 (1614946239.44893) | |- Update Message Sub-Type: ANNOUNCE |- Withdrawn Routes Length: 0 Bytes |- Total Path Attribute Length: 36 Bytes |- Prefix (NLRI): |--- 203.0.113.0/24 |- Path Attributes: |--- ORIGIN: IGP |- Path Attributes: |--- AS_PATH: |- Path Attributes: |--- NEXT_HOP: 10.240.0.226 |- Path Attributes: |--- LOCAL_PREF: |- Path Attributes: |--- LARGE_COMMUNITIES: 65000:1:2 When this happens during a long converge, the clients can experience a significant gap between the withdrawal and announcement causing sub-optimal state on the clients; we have observed this to be more than 10min in edge cases. Under Bird 1.6.5 we see the same restart & route remove/add in the log however we see only 1 UPDATE emitted reflecting the attribute change e.g. 2021-03-05 12:34:40 <INFO> Reconfiguring 2021-03-05 12:34:40 <INFO> Reloading protocol static1 2021-03-05 12:34:40 <INFO> Restarting protocol static1 2021-03-05 12:34:40 <TRACE> bgp1: Reconfigured 2021-03-05 12:34:40 <INFO> Reconfigured 2021-03-05 12:34:40 <TRACE> bgp1 < removed 203.0.113.0/24 unreachable 2021-03-05 12:34:40 <TRACE> bgp1 < added 203.0.113.0/24 unreachable 2021-03-05 12:34:40 <TRACE> bgp1: Sending UPDATE $ cat bgp.pcap | pbgpp --filter-message-type=UPDATE - [BGPMessage UPDATE] - 63 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:52101 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:34:41.414028 (1614947681.414028) | |- Update Message Sub-Type: ANNOUNCE |- Withdrawn Routes Length: 0 Bytes |- Total Path Attribute Length: 36 Bytes |- Prefix (NLRI): |--- 203.0.113.0/24 |- Path Attributes: |--- ORIGIN: IGP |- Path Attributes: |--- AS_PATH: |- Path Attributes: |--- NEXT_HOP: 10.240.0.226 |- Path Attributes: |--- LOCAL_PREF: |- Path Attributes: |--- LARGE_COMMUNITIES: 65000:1:2 - Damian
Hello! Well, this seems to be a forgotten part of the static protocol. I see the bug, there is a missing reload hook in the appropriate channel. Queuing this to be fixed soon. Thank you for the report! Maria On 3/5/21 2:43 PM, Damian Zaremba wrote:
Hi,
I recently observed a case where bird 2.0.7 emits sub-optimal route updates during a reconfiguration.
Using the sample configuration of e.g.
ipv4 table seed; ipv4 table announce;
protocol static { route 203.0.113.0/24 unreachable; ipv4 { table seed; }; }
protocol pipe { table announce; peer table seed; import filter { bgp_large_community.add((65000, 1, 1)); accept; }; }
protocol bgp { debug all; ipv4 { table announce; export all; }; local 10.240.0.226 as 65000; neighbor 10.240.1.59 as 65000; }
When the pipe configuration is updated to e.g.
import filter { bgp_large_community.add((65000, 1, 2)); accept; };
Bird replaces the route into the BGP protocol;
2021-03-05 11:57:36.168 <INFO> Reconfiguring 2021-03-05 11:57:36.168 <INFO> Reloading channel pipe1.sec 2021-03-05 11:57:36.168 <TRACE> bgp1: Reconfigured 2021-03-05 11:57:36.168 <INFO> Reconfigured 2021-03-05 11:57:36.168 <TRACE> bgp1.ipv4 < replaced 203.0.113.0/24 unreachable 2021-03-05 11:57:36.168 <TRACE> bgp1: Sending UPDATE
And emits a single update as expected;
$ cat bgp.pcap | pbgpp --filter-message-type=UPDATE - [BGPMessage UPDATE] - 63 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:32871 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:04:50.255913 (1614945890.255913) | |- Update Message Sub-Type: ANNOUNCE |- Withdrawn Routes Length: 0 Bytes |- Total Path Attribute Length: 36 Bytes |- Prefix (NLRI): |--- 203.0.113.0/24 |- Path Attributes: |--- ORIGIN: IGP |- Path Attributes: |--- AS_PATH: |- Path Attributes: |--- NEXT_HOP: 10.240.0.226 |- Path Attributes: |--- LOCAL_PREF: |- Path Attributes: |--- LARGE_COMMUNITIES: 65000:1:2
However if we directly feed the table from a static protocol e.g.
ipv4 table announce;
protocol static { route 203.0.113.0/24 unreachable; ipv4 { table announce; import filter { bgp_large_community.add((65000, 1, 1)); accept; }; }; }
protocol bgp { debug all; ipv4 { table announce; export all; }; local 10.240.0.226 as 65000; neighbor 10.240.1.59 as 65000; }
The same filter update results in a protocol restart e.g.
2021-03-05 12:10:39.042 <INFO> Reconfiguring 2021-03-05 12:10:39.042 <INFO> Cannot reconfigure channel static1.ipv4 2021-03-05 12:10:39.042 <INFO> Restarting protocol static1 2021-03-05 12:10:39.042 <TRACE> bgp1: Reconfigured 2021-03-05 12:10:39.042 <TRACE> bgp1.ipv4 < removed 203.0.113.0/24 unreachable 2021-03-05 12:10:39.042 <TRACE> bgp1: Sending UPDATE 2021-03-05 12:10:39.042 <TRACE> bgp1.ipv4 < added 203.0.113.0/24 unreachable 2021-03-05 12:10:39.042 <INFO> Reconfigured 2021-03-05 12:10:39.042 <TRACE> bgp1: Sending UPDATE
And our client receives 2 distinct UPDATE packets, the first being an explicit withdrawal, followed by an announcement;
[BGPMessage UPDATE] - 27 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:48185 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:10:39.43871 (1614946239.43871) | |- Update Message Sub-Type: WITHDRAWAL |- Withdrawn Routes Length: 4 Bytes |- Total Path Attribute Length: 0 Bytes |- Withdrawn Routes: |--- 203.0.113.0/24
[BGPMessage UPDATE] - 63 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:48185 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:10:39.44893 (1614946239.44893) | |- Update Message Sub-Type: ANNOUNCE |- Withdrawn Routes Length: 0 Bytes |- Total Path Attribute Length: 36 Bytes |- Prefix (NLRI): |--- 203.0.113.0/24 |- Path Attributes: |--- ORIGIN: IGP |- Path Attributes: |--- AS_PATH: |- Path Attributes: |--- NEXT_HOP: 10.240.0.226 |- Path Attributes: |--- LOCAL_PREF: |- Path Attributes: |--- LARGE_COMMUNITIES: 65000:1:2
When this happens during a long converge, the clients can experience a significant gap between the withdrawal and announcement causing sub-optimal state on the clients; we have observed this to be more than 10min in edge cases.
Under Bird 1.6.5 we see the same restart & route remove/add in the log however we see only 1 UPDATE emitted reflecting the attribute change e.g.
2021-03-05 12:34:40 <INFO> Reconfiguring 2021-03-05 12:34:40 <INFO> Reloading protocol static1 2021-03-05 12:34:40 <INFO> Restarting protocol static1 2021-03-05 12:34:40 <TRACE> bgp1: Reconfigured 2021-03-05 12:34:40 <INFO> Reconfigured 2021-03-05 12:34:40 <TRACE> bgp1 < removed 203.0.113.0/24 unreachable 2021-03-05 12:34:40 <TRACE> bgp1 < added 203.0.113.0/24 unreachable 2021-03-05 12:34:40 <TRACE> bgp1: Sending UPDATE
$ cat bgp.pcap | pbgpp --filter-message-type=UPDATE - [BGPMessage UPDATE] - 63 Bytes |- MAC: 42:01:0a:f0:00:01 -> 42:01:0a:f0:01:3b |- IP: 10.240.0.226:52101 -> 10.240.1.59:179 |- Timestamp: 2021-03-05 12:34:41.414028 (1614947681.414028) | |- Update Message Sub-Type: ANNOUNCE |- Withdrawn Routes Length: 0 Bytes |- Total Path Attribute Length: 36 Bytes |- Prefix (NLRI): |--- 203.0.113.0/24 |- Path Attributes: |--- ORIGIN: IGP |- Path Attributes: |--- AS_PATH: |- Path Attributes: |--- NEXT_HOP: 10.240.0.226 |- Path Attributes: |--- LOCAL_PREF: |- Path Attributes: |--- LARGE_COMMUNITIES: 65000:1:2
- Damian
participants (2)
-
Damian Zaremba -
Maria Matejka