BGP graceful restart for software update only?
In my setup, an instance of BIRD runs all the time, except for when it needs to be restarted for a software update. For that update scenario, I'd like BGP graceful restart to apply, so that the stop-update-restart process does not cause the routes advertised by this BIRD to be withdrawn from the rest of the BGP network. For all other scenarios, however, I don't want any graceful restart. Specifically, if there's a break in connectivity to a BGP peer, I want to detect that as quickly as possible (with BFD), locally to remove the routes learned from that peer, and for that peer to remove routes learned from me, all immediately. Is there some combination of configuration and procedure that can provide both of those desires? Many thanks, Neil
❦ 17 octobre 2019 11:29 +01, Neil Jerram <neil@tigera.io>:
In my setup, an instance of BIRD runs all the time, except for when it needs to be restarted for a software update.
For that update scenario, I'd like BGP graceful restart to apply, so that the stop-update-restart process does not cause the routes advertised by this BIRD to be withdrawn from the rest of the BGP network.
For all other scenarios, however, I don't want any graceful restart. Specifically, if there's a break in connectivity to a BGP peer, I want to detect that as quickly as possible (with BFD), locally to remove the routes learned from that peer, and for that peer to remove routes learned from me, all immediately.
Is there some combination of configuration and procedure that can provide both of those desires?
You should look at the long lived graceful restart alternative. It will enable you to do software upgrades without impact without keeping routes around when a BGP session is cut unexpectedly, as long as you have alternative routes available. -- Terminate input by end-of-file or marker, not by count. - The Elements of Programming Style (Kernighan & Plauger)
Hi Vincent, On Thu, Oct 17, 2019 at 1:05 PM Vincent Bernat <bernat@luffy.cx> wrote:
❦ 17 octobre 2019 11:29 +01, Neil Jerram <neil@tigera.io>:
In my setup, an instance of BIRD runs all the time, except for when it needs to be restarted for a software update.
For that update scenario, I'd like BGP graceful restart to apply, so that the stop-update-restart process does not cause the routes advertised by this BIRD to be withdrawn from the rest of the BGP network.
For all other scenarios, however, I don't want any graceful restart. Specifically, if there's a break in connectivity to a BGP peer, I want to detect that as quickly as possible (with BFD), locally to remove the routes learned from that peer, and for that peer to remove routes learned from me, all immediately.
Is there some combination of configuration and procedure that can provide both of those desires?
You should look at the long lived graceful restart alternative. It will enable you to do software upgrades without impact without keeping routes around when a BGP session is cut unexpectedly, as long as you have alternative routes available.
I have indeed been looking at LLGR, and I'm very grateful for your blog about it. I'm not sure I'm persuaded by your argument, though, that LLGR is desirable because BFD could generate a false negative. Wouldn't it be better to eliminate those false negatives by allowing BFD to run a more slowly and/or with a larger multiplier? IIUC, the benefit of GR, for a planned software update, is that it can completely avoid any route flapping in the connected BGP peers - in terms both of the data path and the BGP control plane. With LLGR in that scenario, I believe there will be BGP control plane traffic, and other BIRDs updating their local kernel with least-preferred routes. I suppose it is still true that there is no data path flapping, but there *has* been a lot of control plane churn, which traditional GR would have avoided. Is that understanding all correct? Best wishes, Neil
❦ 17 octobre 2019 16:47 +01, Neil Jerram <neil@tigera.io>:
I'm not sure I'm persuaded by your argument, though, that LLGR is desirable because BFD could generate a false negative. Wouldn't it be better to eliminate those false negatives by allowing BFD to run a more slowly and/or with a larger multiplier?
If you are using BFD, a slower BFD means a slower correction if a link comes down. If you use 1s*3, you may have a 3-second impact when someone unplug a cable (on non P2P BGP sessions). This may or may not acceptable. In my context, I was selling the design against P2P LACP links where the detection was configured around 100ms on the Linux side (Linux polls the NIC driver to know if a link down at a configurable interval). So, 3s vs 100ms would be hard to sell. 450ms (150ms * 3) vs 100ms was easier. It's a bit apple and orange, but the context was L2 design vs L3 design over distinct L2 fabrics. But BFD has nothing to do with your request. You can use LLGR without BFD.
IIUC, the benefit of GR, for a planned software update, is that it can completely avoid any route flapping in the connected BGP peers - in terms both of the data path and the BGP control plane. With LLGR in that scenario, I believe there will be BGP control plane traffic, and other BIRDs updating their local kernel with least-preferred routes. I suppose it is still true that there is no data path flapping, but there *has* been a lot of control plane churn, which traditional GR would have avoided. Is that understanding all correct?
Yes, I think you are right. Depending on what is more important for you, I still think LLGR is an improvement over GR. While you may have some churn at the control plane level with LLGR due to the deprioritization of some routes, you do not keep non-working routes during an outage (as long as the routes are present on another peer). I don't think it is possible to have GR only for upgrades. It would need to reverse the logic: sending a BGP cease message "upgrade in progress" to let the other end know they should keep the routes. Maybe something could be done with a community similar to the graceful session shutdown but with a timer associated. -- Writing is turning one's worst moments into money. -- J.P. Donleavy
On Fri, Oct 18, 2019 at 09:17:35AM +0200, Vincent Bernat wrote:
I don't think it is possible to have GR only for upgrades. It would need to reverse the logic: sending a BGP cease message "upgrade in progress" to let the other end know they should keep the routes. Maybe something could be done with a community similar to the graceful session shutdown but with a timer associated.
Hi Currently there is RFC 8538 that adds soft-reset notification messages triggering BGP graceful restart, but i don't think it allows disabling unplanned GR. So that would likely require another draft. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (3)
-
Neil Jerram -
Ondrej Zajicek -
Vincent Bernat