Hey! I am trying to make BGP graceful restart work. First, I noticed that BGP graceful restart can only work if BIRD doesn't close cleanly the BGP session. Otherwise, an administrative shutdown is sent and the other end (also BIRD) cleans all routes and don't consider this as a graceful restart. 2016-12-02 10:09:24 <RMT> R1: Received: Administrative shutdown 2016-12-02 10:09:24 <TRACE> R1: BGP session closed 2016-12-02 10:09:24 <TRACE> R1: State changed to stop 2016-12-02 10:09:24 <TRACE> R1 > removed [sole] 203.0.113.0/24 via 192.0.2.1 on eth0 Is that an expected behavior? The second problem I run into is when using BFD. If I kill -9 bird, BFD will quickly detects the problem and shutdown the BGP session. It will not be considered a graceful restart either. 2016-12-02 10:52:50 <TRACE> R1: Neighbor graceful restart detected 2016-12-02 10:52:50 <TRACE> R1: State changed to start 2016-12-02 10:52:50 <TRACE> R1: BGP session closed 2016-12-02 10:52:50 <TRACE> R1: Connect delayed by 5 seconds 2016-12-02 10:52:51 <TRACE> R1: BFD session down 2016-12-02 10:52:51 <TRACE> R1: State changed to stop 2016-12-02 10:52:51 <TRACE> R1 > removed [sole] 203.0.113.0/24 via 192.0.2.1 on eth0 Therefore, BFD seems incompatible with graceful restart. The Juniper implementation has some provisions to make BFD and BGP graceful restart works together:
So that BFD can maintain its BFD protocol sessions across a BGP graceful restart, BGP requests that BFD set the C bit to 1 in transmitted BFD packets. When the C bit is set to 1, BFD can maintain its session in the forwarding plane in spite of disruptions in the control plane. Setting the bit to 1 gives BGP neighbors acting as a graceful restart helper the most accurate information about whether the forwarding plane is up.
When BGP is acting as a graceful restart helper and the BFD session to the BGP peer is lost, one of the following actions takes place: - If the C bit received in the BFD packets was 1, BGP immediately flushes all routes, determining that the forwarding plane on the BGP peer has gone down. - If the C bit received in the BFD packets was 0, BGP marks all routes as stale but does not flush them because the forwarding plane on the BGP peer might be working and only the control plane has gone down.
Unrelated to BGP restart but related to BFD, if one BGP peer has a temporary network issue, BFD will quickly close the session and then require a startup delay for the session. When the network outage is solved and one peer tries to reconnect, the session is rejected because of this startup delay: 2016-12-02 11:03:55 <TRACE> R1: State changed to start 2016-12-02 11:03:55 <TRACE> R1: Startup delayed by 60 seconds due to errors 2016-12-02 11:04:02 <TRACE> R1: Incoming connection from 192.0.2.1 (port 49205) rejected 2016-12-02 11:04:07 <TRACE> R1: Incoming connection from 192.0.2.1 (port 36449) rejected The delay can be configured to a lower value, but is it the expected behavior? The current code is: acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) && (p->start_state >= BSS_CONNECT) && (!p->incoming_conn.sk); Could this be changed to? acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) && (p->start_state >= BSS_DELAY) && (!p->incoming_conn.sk); I have put a more detailed summary of my investigations here: https://github.com/vincentbernat/network-lab/tree/caceb38e8543ec22a7693611bb... -- Use uniform input formats. - The Elements of Programming Style (Kernighan & Plauger)
❦ 2 décembre 2016 11:42 +0100, Vincent Bernat <bernat@luffy.cx> :
Therefore, BFD seems incompatible with graceful restart. The Juniper implementation has some provisions to make BFD and BGP graceful restart works together: [...]
After trying the Juniper implementation, I discovered that the behavior is consistent with BIRD: if BFD fails, the session is destroyed and no graceful restart is possible. The difference is that on JunOS, you have a separate bfdd process and therefore, you can gracefully restart the BGP process while keeping the BFD sessions up. Something not possible with BIRD. Restarting BIRD "fast" is not enough. The discriminator changes and therefore, the BFD session is detected as down. I suppose JunOS could consider this as a graceful restart, but that's not the case. So far, my only solution would be to have a move BFD into its own daemon. -- Don't stop at one bug. - The Elements of Programming Style (Kernighan & Plauger)
On Fri, Dec 02, 2016 at 11:42:25AM +0100, Vincent Bernat wrote:
Hey!
I am trying to make BGP graceful restart work. First, I noticed that BGP graceful restart can only work if BIRD doesn't close cleanly the BGP session. Otherwise, an administrative shutdown is sent and the other end (also BIRD) cleans all routes and don't consider this as a graceful restart.
2016-12-02 10:09:24 <RMT> R1: Received: Administrative shutdown 2016-12-02 10:09:24 <TRACE> R1: BGP session closed 2016-12-02 10:09:24 <TRACE> R1: State changed to stop 2016-12-02 10:09:24 <TRACE> R1 > removed [sole] 203.0.113.0/24 via 192.0.2.1 on eth0
Is that an expected behavior?
Hi There are three different cases: 1) regular (administartive) shutdown/restart 2) planned graceful restart (e.g. software version update) 3) unplanned graceful restart (e.g. software crash and respawn) Regular shutdown command does (1), so it is expected to see regular BGP session shutdown. Case (3) should work without much problems. But there is no explicit support for case (2), you have to use kill -9 as we are missing some command that explicitly activates graceful restart.
The second problem I run into is when using BFD. If I kill -9 bird, BFD will quickly detects the problem and shutdown the BGP session. It will not be considered a graceful restart either.
We should have better handling of C-bit in BFD (for example, we have the same behavior regardless of neighbor's C-bit value). But still there is a fundamental limitation of having BFD in control plane or even in the same process. There is one potential solution - for case (2), we could explicitly shutdown BFD sessions when graceful restart is requested. As graceful restart is just an avisory mechanism, BGP should survive shutdown of BFD session, then regular BGP graceful restart should work. Case (3) is more problematic. RFC 5882 specifies that with C-bit zero, helper should avoid abort of graceful restart when BFD session fails. But that will work only if graceful restart is detected before BFD session failure is detected. I guess that may work in some cases (bird is killed and OS immediately closes TCP socket for BGP session, which is detected by other side).
Unrelated to BGP restart but related to BFD, if one BGP peer has a temporary network issue, BFD will quickly close the session and then require a startup delay for the session. When the network outage is solved and one peer tries to reconnect, the session is rejected because of this startup delay:
2016-12-02 11:03:55 <TRACE> R1: State changed to start 2016-12-02 11:03:55 <TRACE> R1: Startup delayed by 60 seconds due to errors 2016-12-02 11:04:02 <TRACE> R1: Incoming connection from 192.0.2.1 (port 49205) rejected 2016-12-02 11:04:07 <TRACE> R1: Incoming connection from 192.0.2.1 (port 36449) rejected
The delay can be configured to a lower value, but is it the expected behavior?
Yes, it was designed so any crash that causes tearing down of an established BGP session is limited by this delay. So you don't get session flapping with full BGP feed every few seconds. Note that this should not happen when neighbor did graceful restart as BGP stays in BSS_CONNECT in that case. The current code is:
acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) && (p->start_state >= BSS_CONNECT) && (!p->incoming_conn.sk);
Could this be changed to?
acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) && (p->start_state >= BSS_DELAY) && (!p->incoming_conn.sk);
That would just eliminate the delay for incoming connections altogether. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
❦ 2 décembre 2016 17:11 +0100, Ondrej Zajicek <santiago@crfreenet.org> :
There are three different cases:
1) regular (administartive) shutdown/restart 2) planned graceful restart (e.g. software version update) 3) unplanned graceful restart (e.g. software crash and respawn)
Regular shutdown command does (1), so it is expected to see regular BGP session shutdown. Case (3) should work without much problems. But there is no explicit support for case (2), you have to use kill -9 as we are missing some command that explicitly activates graceful restart.
The second problem I run into is when using BFD. If I kill -9 bird, BFD will quickly detects the problem and shutdown the BGP session. It will not be considered a graceful restart either.
We should have better handling of C-bit in BFD (for example, we have the same behavior regardless of neighbor's C-bit value). But still there is a fundamental limitation of having BFD in control plane or even in the same process.
There is one potential solution - for case (2), we could explicitly shutdown BFD sessions when graceful restart is requested. As graceful restart is just an avisory mechanism, BGP should survive shutdown of BFD session, then regular BGP graceful restart should work.
It seems easy enough to do. I may have a look at this point since it's my main interest. I don't expect things to crash and can live with not having a graceful restart in this case. Would it be better to enable graceful signal with a signal or through the socket? -- The human race has one really effective weapon, and that is laughter. -- Mark Twain
participants (2)
-
Ondrej Zajicek -
Vincent Bernat