route withdrawals are not sent upon protocol removal
Hello, bird (all versions) does not send BGP route withdrawals when a protocol is removed, only when the filter changes or the protocol is disabled. Is this intentional? If not, could it be a configurable option to send route withdrawals upon protocols being removed? I am having an issue where the peers keep stale routes that were removed from bird but were not withdrawn via update messages. Regards, Thomás
On Thu, May 31, 2018 at 03:42:12PM +0100, Thomás S. Bregolin wrote:
Hello,
bird (all versions) does not send BGP route withdrawals when a protocol is removed, only when the filter changes or the protocol is disabled. Is this intentional? If not, could it be a configurable option to send route withdrawals upon protocols being removed?
Hello Do you mean route withdrawals for routes received from that removed protocol? Send to other peers? This should not be an issue - when a protocol is removed or disabled, all its routes are removed and withdrawals should be sent to other protocols.
I am having an issue where the peers keep stale routes that were removed from bird but were not withdrawn via update messages.
Do you see these stale routes on routes on routers directly connected with (now removed) session, or on their neighbors? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Mon, Jun 4, 2018 at 11:45 AM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Thu, May 31, 2018 at 03:42:12PM +0100, Thomás S. Bregolin wrote:
Hello,
Hi Ondrej, Thank you for your response. Do you mean route withdrawals for routes received from that removed
protocol? Send to other peers? This should not be an issue - when a protocol is removed or disabled, all its routes are removed and withdrawals should be sent to other protocols.
Yes, I mean withdrawals for routes received by the router from the peer running bird. The peer is directly connected with the router with a bgp session. I am removing some static protocols and doing "birdc configure soft" but no route withdrawals are being sent for the removed prefixes. Even if I try to run "birdc reload out all", the withdrawals are still not sent since bird forgot about the prefixes already when I did the "configure soft": bird[28196]: Removing protocol e_010_cidrs bird[28196]: Reconfigured If I disable the protocol with "birdc disable e_010_cidrs" without removing it from the config files, I see the esame "Remoing protocol e_010_cidrs" message in the logs and the withdrawals are sent as expected, but I expect bird to also send withdrawals when a protocol is removed from the configs without being explicitly disabled. Do you also think this is a bug? Best regards, Thomás
I will give them a +1 That should be implemented to bird. If you remove a protocol it should send withdrawals to the peer before unconfigure the service. 2018-06-04 14:29 GMT+02:00 Thomás S. Bregolin <thoms3rd@gmail.com>:
On Mon, Jun 4, 2018 at 11:45 AM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Thu, May 31, 2018 at 03:42:12PM +0100, Thomás S. Bregolin wrote:
Hello,
Hi Ondrej,
Thank you for your response.
Do you mean route withdrawals for routes received from that removed
protocol? Send to other peers? This should not be an issue - when a protocol is removed or disabled, all its routes are removed and withdrawals should be sent to other protocols.
Yes, I mean withdrawals for routes received by the router from the peer running bird. The peer is directly connected with the router with a bgp session.
I am removing some static protocols and doing "birdc configure soft" but no route withdrawals are being sent for the removed prefixes. Even if I try to run "birdc reload out all", the withdrawals are still not sent since bird forgot about the prefixes already when I did the "configure soft":
bird[28196]: Removing protocol e_010_cidrs bird[28196]: Reconfigured
If I disable the protocol with "birdc disable e_010_cidrs" without removing it from the config files, I see the esame "Remoing protocol e_010_cidrs" message in the logs and the withdrawals are sent as expected, but I expect bird to also send withdrawals when a protocol is removed from the configs without being explicitly disabled.
Do you also think this is a bug?
Best regards,
Thomás
Hello!
Do you mean route withdrawals for routes received from that removed protocol? Send to other peers? This should not be an issue - when a protocol is removed or disabled, all its routes are removed and withdrawals should be sent to other protocols.
Yes, I mean withdrawals for routes received by the router from the peer running bird. The peer is directly connected with the router with a bgp session.
I am removing some static protocols and doing "birdc configure soft" but no route withdrawals are being sent for the removed prefixes. Even if I try to run "birdc reload out all", the withdrawals are still not sent since bird forgot about the prefixes already when I did the "configure soft":
bird[28196]: Removing protocol e_010_cidrs bird[28196]: Reconfigured
If I disable the protocol with "birdc disable e_010_cidrs" without removing it from the config files, I see the esame "Remoing protocol e_010_cidrs" message in the logs and the withdrawals are sent as expected, but I expect bird to also send withdrawals when a protocol is removed from the configs without being explicitly disabled.
Do you also think this is a bug?
I don't know whether I correctly understand what you are trying to do. Anyway, please look at the attached script (for Linux). It creates two virtual routers CA and CB, a BGP link between them and sends a route defined in protocol static {} on CA. Then it reconfigures CA to remove the static protocol and it correctly shows that the route is withdrawn, at least for me at v1.6.4. Note that the script creates temporarily two network namespaces (ca and cb) and leaves behind two configs and two log files. Could you please try the attached script or try to create some reproducer for me to see the bug clearly? Thanks! Maria
On Tue, Jun 5, 2018 at 2:13 PM, Jan Maria Matejka <jan.matejka@nic.cz> wrote:
Hello!
Hello Jan, Thank you for looking into this.
I don't know whether I correctly understand what you are trying to do. Anyway, please look at the attached script (for Linux). It creates two virtual routers CA and CB, a BGP link between them and sends a route defined in protocol static {} on CA.
I am trying to remove a whole "protocol static {}" block from the configurations and have the withdrawal be sent to the peers upon "configure soft" without having to restart the BGP session.
Then it reconfigures CA to remove the static protocol and it correctly shows that the route is withdrawn, at least for me at v1.6.4.
The routes are removed from bird, but a packet capture shows that no UPDATE message with the route withdrawals is actually sent to the peers, so the route lingers in the peers' routing tables. Could you please try the attached script or try to create some reproducer
for me to see the bug clearly?
I will give the test-withdraw script a go and reply with more information. Thank you, Thomás
Hello, On Tue, Jun 5, 2018 at 3:43 PM, Thomás S. Bregolin <thoms3rd@gmail.com> wrote:
On Tue, Jun 5, 2018 at 2:13 PM, Jan Maria Matejka <jan.matejka@nic.cz> wrote:
Could you please try the attached script or try to create some reproducer for me to see the bug clearly?
I will give the test-withdraw script a go and reply with more information.
I've attached a modified version of the test-withdraw script showing the issue. It seems my problem is related to the "start delay time" option. When I set it to 1, sometimes the withdrawal is sent, and sometimes it isn't. However, when it *isn't* sent, it is *never* sent, no matter how long I wait. The problem is solved by setting the timeout to a higher value, or using the default 5 seconds. I am no longer sure if this is a bug or not. Please let me know if you would like more information about this. Regards, Thomás
On Wed, Jun 6, 2018 at 1:33 PM, Thomás S. Bregolin <thoms3rd@gmail.com> wrote:
Hello,
On Tue, Jun 5, 2018 at 3:43 PM, Thomás S. Bregolin <thoms3rd@gmail.com> wrote:
On Tue, Jun 5, 2018 at 2:13 PM, Jan Maria Matejka <jan.matejka@nic.cz> wrote:
Could you please try the attached script or try to create some reproducer for me to see the bug clearly?
I will give the test-withdraw script a go and reply with more information.
I've attached a modified version of the test-withdraw script showing the issue. It seems my problem is related to the "start delay time" option. When I set it to 1, sometimes the withdrawal is sent, and sometimes it isn't. However, when it *isn't* sent, it is *never* sent, no matter how long I wait. The problem is solved by setting the timeout to a higher value, or using the default 5 seconds.
I am no longer sure if this is a bug or not. Please let me know if you would like more information about this.
P.S.: removing the time delay solves the issue in the test script, but not in my production environment. Seem I have some more debugging to do.
On Wed, Jun 6, 2018 at 1:59 PM, Thomás S. Bregolin <thoms3rd@gmail.com> wrote:
On Wed, Jun 6, 2018 at 1:33 PM, Thomás S. Bregolin <thoms3rd@gmail.com> wrote:
Hello,
On Tue, Jun 5, 2018 at 3:43 PM, Thomás S. Bregolin <thoms3rd@gmail.com> wrote:
On Tue, Jun 5, 2018 at 2:13 PM, Jan Maria Matejka <jan.matejka@nic.cz> wrote:
Could you please try the attached script or try to create some reproducer for me to see the bug clearly?
I will give the test-withdraw script a go and reply with more information.
I've attached a modified version of the test-withdraw script showing the issue. It seems my problem is related to the "start delay time" option. When I set it to 1, sometimes the withdrawal is sent, and sometimes it isn't. However, when it *isn't* sent, it is *never* sent, no matter how long I wait. The problem is solved by setting the timeout to a higher value, or using the default 5 seconds.
I am no longer sure if this is a bug or not. Please let me know if you would like more information about this.
P.S.: removing the time delay solves the issue in the test script, but not in my production environment. Seem I have some more debugging to do.
Some debug logs with a "start delay time" of 20 seconds: Jun 06 13:11:47 m03 bird[28196]: Reconfiguring Jun 06 13:11:47 m03 bird[28196]: Removing protocol e_route_20 Jun 06 13:11:47 m03 bird[28196]: kernel1: Reconfigured Jun 06 13:11:47 m03 bird[28196]: device1: Reconfigured Jun 06 13:11:47 m03 bird[28196]: bgpint: Reconfigured Jun 06 13:11:47 m03 bird[28196]: Reconfigured You can see the protocol is removed but the prefixes are not withdrawn; there are no "bird[28196]: cf_bgplan < removed ... via ... on ..." messages. - Thomás
On Wed, Jun 06, 2018 at 01:33:53PM +0100, Thomás S. Bregolin wrote:
Hello,
On Tue, Jun 5, 2018 at 3:43 PM, Thomás S. Bregolin <thoms3rd@gmail.com> wrote:
On Tue, Jun 5, 2018 at 2:13 PM, Jan Maria Matejka <jan.matejka@nic.cz> wrote:
Could you please try the attached script or try to create some reproducer for me to see the bug clearly?
I will give the test-withdraw script a go and reply with more information.
I've attached a modified version of the test-withdraw script showing the issue. It seems my problem is related to the "start delay time" option. When I set it to 1, sometimes the withdrawal is sent, and sometimes it isn't. However, when it *isn't* sent, it is *never* sent, no matter how long I wait. The problem is solved by setting the timeout to a higher value, or using the default 5 seconds.
Hello Note that in the modified version no protocols are removed, both testproto and testproto2 are still configured and running. Just the BGP export filter was changed, and because 'configure soft' was used, then it was not reloaded. That is expected. But it is likely that if both changes happen simultaneously, i.e. a static protocol is removed and a BGP export filter is changed to no longer allow routes from that protocol, then withdraws are not sent. This is likely a bug, could be avoided by not doing both protocol removal and export filter change at the same time. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, Jun 6, 2018 at 2:25 PM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Wed, Jun 06, 2018 at 01:33:53PM +0100, Thomás S. Bregolin wrote: [...]
I've attached a modified version of the test-withdraw script showing the issue. It seems my problem is related to the "start delay time" option. When I set it to 1, sometimes the withdrawal is sent, and sometimes it isn't. However, when it *isn't* sent, it is *never* sent, no matter how long I wait. The problem is solved by setting the timeout to a higher value, or using the default 5 seconds.
Hello
Note that in the modified version no protocols are removed, both testproto and testproto2 are still configured and running. Just the BGP export filter was changed, and because 'configure soft' was used, then it was not reloaded. That is expected.
But it is likely that if both changes happen simultaneously, i.e. a static protocol is removed and a BGP export filter is changed to no longer allow routes from that protocol, then withdraws are not sent. This is likely a bug, could be avoided by not doing both protocol removal and export filter change at the same time.
Oh, interesting! Indeed, if I remove just the filter or both the filter and route, the withdrawals are not sent. However, if I remove only the route, the withdrawal is sent. Indeed, as you say, it sounds like a bug. For two reasons: - BIRD should never remove a protocol without sending a withdrawal (unless that is wanted as a feature or configuration option); - There is no reason why removing the filter at the same time as the route should change the behaviour; Thank you once again for looking into this. Please let me know if I should write up a proper bug report and send it here. Thomás
On Wed, Jun 06, 2018 at 03:17:37PM +0100, Thomás S. Bregolin wrote:
On Wed, Jun 6, 2018 at 2:25 PM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Wed, Jun 06, 2018 at 01:33:53PM +0100, Thomás S. Bregolin wrote: [...]
I've attached a modified version of the test-withdraw script showing the issue. It seems my problem is related to the "start delay time" option. When I set it to 1, sometimes the withdrawal is sent, and sometimes it isn't. However, when it *isn't* sent, it is *never* sent, no matter how long I wait. The problem is solved by setting the timeout to a higher value, or using the default 5 seconds.
Hello
Note that in the modified version no protocols are removed, both testproto and testproto2 are still configured and running. Just the BGP export filter was changed, and because 'configure soft' was used, then it was not reloaded. That is expected.
But it is likely that if both changes happen simultaneously, i.e. a static protocol is removed and a BGP export filter is changed to no longer allow routes from that protocol, then withdraws are not sent. This is likely a bug, could be avoided by not doing both protocol removal and export filter change at the same time.
Oh, interesting! Indeed, if I remove just the filter or both the filter and route, the withdrawals are not sent. However, if I remove only the route, the withdrawal is sent.
If you remove just the filter, then withdrawal is not sent, because you used 'configure soft' and not 'configure'. That is expected and will be fixed when you use 'reload' command. The problem is only when a route is removed during such interval, as then during 'reload' BIRD no longer knows about it and therefore does not send a withdrawal. Also note that when both sides has BGP with 'enhanced route refresh', the issue should not be problem. As in such case 'reload out' is delimited by Begin-RR and End-RR messages and received should remove all routes that were not received during that interval. As BIRD has this feature for a long time, not sure why it does not happen in your case. Do you have something different than BIRD on the other side? What is reported in 'Neighbor caps' in 'show protocols all' for BGP session?
Indeed, as you say, it sounds like a bug. For two reasons:
- BIRD should never remove a protocol without sending a withdrawal (unless that is wanted as a feature or configuration option); - There is no reason why removing the filter at the same time as the route should change the behaviour;
We could probably fix that for a cost of sending some spurious withdrawals in some cases. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, Jun 6, 2018 at 3:39 PM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Wed, Jun 06, 2018 at 03:17:37PM +0100, Thomás S. Bregolin wrote:
On Wed, Jun 6, 2018 at 2:25 PM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Wed, Jun 06, 2018 at 01:33:53PM +0100, Thomás S. Bregolin wrote: [...]
I've attached a modified version of the test-withdraw script showing the issue. It seems my problem is related to the "start delay time" option. When I set it to 1, sometimes the withdrawal is sent, and sometimes it isn't. However, when it *isn't* sent, it is *never* sent, no matter how long I wait. The problem is solved by setting the timeout to a higher value, or using the default 5 seconds.
Hello
Note that in the modified version no protocols are removed, both testproto and testproto2 are still configured and running. Just the BGP export filter was changed, and because 'configure soft' was used, then it was not reloaded. That is expected.
But it is likely that if both changes happen simultaneously, i.e. a static protocol is removed and a BGP export filter is changed to no longer allow routes from that protocol, then withdraws are not sent. This is likely a bug, could be avoided by not doing both protocol removal and export filter change at the same time.
Oh, interesting! Indeed, if I remove just the filter or both the filter and route, the withdrawals are not sent. However, if I remove only the route, the withdrawal is sent.
If you remove just the filter, then withdrawal is not sent, because you used 'configure soft' and not 'configure'. That is expected and will be fixed when you use 'reload' command.
The problem is only when a route is removed during such interval, as then during 'reload' BIRD no longer knows about it and therefore does not send a withdrawal.
Also note that when both sides has BGP with 'enhanced route refresh', the issue should not be problem. As in such case 'reload out' is delimited by Begin-RR and End-RR messages and received should remove all routes that were not received during that interval.
As BIRD has this feature for a long time, not sure why it does not happen in your case. Do you have something different than BIRD on the other side? What is reported in 'Neighbor caps' in 'show protocols all' for BGP session?
I see. It's not BIRD on the other side. In this instance I'm using a QFX with JUNOS 15. But I believe I've seen the issue with Cisco and Arista routers, too. Will have to confirm, though. I have this in neighboor caps: Neighbor caps: refresh restart-aware AS4 I'll check about the enhanced route refresh. - Thomás
On Wed, Jun 06, 2018 at 03:49:57PM +0100, Thomás S. Bregolin wrote:
I see. It's not BIRD on the other side. In this instance I'm using a QFX with JUNOS 15. But I believe I've seen the issue with Cisco and Arista routers, too. Will have to confirm, though.
I have this in neighboor caps:
eighbor caps: refresh restart-aware AS4
I'll check about the enhanced route refresh.
With these caps there is no enhanced route refresh support, that would look: Neighbor caps: refresh enhanced-refresh ... -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hello Ondrej, I've been going through the BIRD source recently and as I sat down to take a stab at a patch for this withdrawal thing I bumped into this while going through nest/rt-table.c: https://gitlab.labs.nic.cz/labs/bird/commit/cbfdf6ed057b993d7e107b4c39b8a5b8... https://gitlab.labs.nic.cz/labs/bird/commit/470efcb98cb33de2d5636679eb0f72c8... So just wanted to thank you for working on this. I'll be doing some testing with the patches. Do you think it'll make it into 1.6.5 in the near future? Greetings, Thomás
On Sat, Jul 07, 2018 at 07:08:06PM +0100, Thomás S. Bregolin wrote:
Hello Ondrej,
I've been going through the BIRD source recently and as I sat down to take a stab at a patch for this withdrawal thing I bumped into this while going through nest/rt-table.c:
https://gitlab.labs.nic.cz/labs/bird/commit/cbfdf6ed057b993d7e107b4c39b8a5b8... https://gitlab.labs.nic.cz/labs/bird/commit/470efcb98cb33de2d5636679eb0f72c8...
So just wanted to thank you for working on this. I'll be doing some testing with the patches. Do you think it'll make it into 1.6.5 in the near future?
Hello Yes, these two patches should fix the issue completely. They will make it into 1.6.5 (and their variants make it to 2.0.3), which i hope to be released in several weeks. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (4)
-
Jan Maria Matejka -
Michael Rack -
Ondrej Zajicek -
Thomás S. Bregolin