On Fri, Dec 02, 2016 at 11:42:25AM +0100, Vincent Bernat wrote:
Hey!
I am trying to make BGP graceful restart work. First, I noticed that BGP graceful restart can only work if BIRD doesn't close cleanly the BGP session. Otherwise, an administrative shutdown is sent and the other end (also BIRD) cleans all routes and don't consider this as a graceful restart.
2016-12-02 10:09:24 <RMT> R1: Received: Administrative shutdown 2016-12-02 10:09:24 <TRACE> R1: BGP session closed 2016-12-02 10:09:24 <TRACE> R1: State changed to stop 2016-12-02 10:09:24 <TRACE> R1 > removed [sole] 203.0.113.0/24 via 192.0.2.1 on eth0
Is that an expected behavior?
Hi There are three different cases: 1) regular (administartive) shutdown/restart 2) planned graceful restart (e.g. software version update) 3) unplanned graceful restart (e.g. software crash and respawn) Regular shutdown command does (1), so it is expected to see regular BGP session shutdown. Case (3) should work without much problems. But there is no explicit support for case (2), you have to use kill -9 as we are missing some command that explicitly activates graceful restart.
The second problem I run into is when using BFD. If I kill -9 bird, BFD will quickly detects the problem and shutdown the BGP session. It will not be considered a graceful restart either.
We should have better handling of C-bit in BFD (for example, we have the same behavior regardless of neighbor's C-bit value). But still there is a fundamental limitation of having BFD in control plane or even in the same process. There is one potential solution - for case (2), we could explicitly shutdown BFD sessions when graceful restart is requested. As graceful restart is just an avisory mechanism, BGP should survive shutdown of BFD session, then regular BGP graceful restart should work. Case (3) is more problematic. RFC 5882 specifies that with C-bit zero, helper should avoid abort of graceful restart when BFD session fails. But that will work only if graceful restart is detected before BFD session failure is detected. I guess that may work in some cases (bird is killed and OS immediately closes TCP socket for BGP session, which is detected by other side).
Unrelated to BGP restart but related to BFD, if one BGP peer has a temporary network issue, BFD will quickly close the session and then require a startup delay for the session. When the network outage is solved and one peer tries to reconnect, the session is rejected because of this startup delay:
2016-12-02 11:03:55 <TRACE> R1: State changed to start 2016-12-02 11:03:55 <TRACE> R1: Startup delayed by 60 seconds due to errors 2016-12-02 11:04:02 <TRACE> R1: Incoming connection from 192.0.2.1 (port 49205) rejected 2016-12-02 11:04:07 <TRACE> R1: Incoming connection from 192.0.2.1 (port 36449) rejected
The delay can be configured to a lower value, but is it the expected behavior?
Yes, it was designed so any crash that causes tearing down of an established BGP session is limited by this delay. So you don't get session flapping with full BGP feed every few seconds. Note that this should not happen when neighbor did graceful restart as BGP stays in BSS_CONNECT in that case. The current code is:
acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) && (p->start_state >= BSS_CONNECT) && (!p->incoming_conn.sk);
Could this be changed to?
acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) && (p->start_state >= BSS_DELAY) && (!p->incoming_conn.sk);
That would just eliminate the delay for incoming connections altogether. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."