BGP graceful restart and BFD

Fri Dec 2 11:42:25 CET 2016

Hey!

I am trying to make BGP graceful restart work. First, I noticed that BGP
graceful restart can only work if BIRD doesn't close cleanly the BGP
session. Otherwise, an administrative shutdown is sent and the other end
(also BIRD) cleans all routes and don't consider this as a graceful
restart.

    2016-12-02 10:09:24 <RMT> R1: Received: Administrative shutdown
    2016-12-02 10:09:24 <TRACE> R1: BGP session closed
    2016-12-02 10:09:24 <TRACE> R1: State changed to stop
    2016-12-02 10:09:24 <TRACE> R1 > removed [sole] 203.0.113.0/24 via 192.0.2.1 on eth0

Is that an expected behavior?

The second problem I run into is when using BFD. If I kill -9 bird, BFD
will quickly detects the problem and shutdown the BGP session. It will
not be considered a graceful restart either.

    2016-12-02 10:52:50 <TRACE> R1: Neighbor graceful restart detected
    2016-12-02 10:52:50 <TRACE> R1: State changed to start
    2016-12-02 10:52:50 <TRACE> R1: BGP session closed
    2016-12-02 10:52:50 <TRACE> R1: Connect delayed by 5 seconds
    2016-12-02 10:52:51 <TRACE> R1: BFD session down
    2016-12-02 10:52:51 <TRACE> R1: State changed to stop
    2016-12-02 10:52:51 <TRACE> R1 > removed [sole] 203.0.113.0/24 via 192.0.2.1 on eth0

Therefore, BFD seems incompatible with graceful restart. The Juniper
implementation has some provisions to make BFD and BGP graceful restart
works together:

> So that BFD can maintain its BFD protocol sessions across a BGP
> graceful restart, BGP requests that BFD set the C bit to 1 in
> transmitted BFD packets. When the C bit is set to 1, BFD can
> maintain its session in the forwarding plane in spite of disruptions
> in the control plane. Setting the bit to 1 gives BGP neighbors
> acting as a graceful restart helper the most accurate information
> about whether the forwarding plane is up.
>
> When BGP is acting as a graceful restart helper and the BFD session
> to the BGP peer is lost, one of the following actions takes place:
>  - If the C bit received in the BFD packets was 1, BGP immediately
>    flushes all routes, determining that the forwarding plane on the
>    BGP peer has gone down.
>  - If the C bit received in the BFD packets was 0, BGP marks all
>    routes as stale but does not flush them because the forwarding
>    plane on the BGP peer might be working and only the control plane
>    has gone down.

Unrelated to BGP restart but related to BFD, if one BGP peer has a
temporary network issue, BFD will quickly close the session and then
require a startup delay for the session. When the network outage is
solved and one peer tries to reconnect, the session is rejected because
of this startup delay:

    2016-12-02 11:03:55 <TRACE> R1: State changed to start
    2016-12-02 11:03:55 <TRACE> R1: Startup delayed by 60 seconds due to errors
    2016-12-02 11:04:02 <TRACE> R1: Incoming connection from 192.0.2.1 (port 49205) rejected
    2016-12-02 11:04:07 <TRACE> R1: Incoming connection from 192.0.2.1 (port 36449) rejected

The delay can be configured to a lower value, but is it the expected
behavior? The current code is:

    acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) &&
      (p->start_state >= BSS_CONNECT) && (!p->incoming_conn.sk);

Could this be changed to?

    acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) &&
      (p->start_state >= BSS_DELAY) && (!p->incoming_conn.sk);

I have put a more detailed summary of my investigations here:
 https://github.com/vincentbernat/network-lab/tree/caceb38e8543ec22a7693611bbd84cdf36e92e12/lab-bgp-graceful-restart
-- 
Use uniform input formats.
            - The Elements of Programming Style (Kernighan & Plauger)