BIRD continues exporting routes but reports no exports

Hugo Slabbert hugo.slabbert at menlosecurity.com
Thu Mar 2 23:07:07 CET 2023


Hi folks,

We're seeing an issue where BIRD is reporting one thing for BGP route
exports (via `show route export <protocol>`) but actually doing something
different.

We see this when we change some of the conditions around export. Our export
filters include a function that evaluates as either true or false, for a
global toggle of whether we should proceed on with any exports to relevant
peers. Roughly it looks like this:

```
function do_export ()
bool DRAIN_NODE;
bool HEALTHY;
{
    DRAIN_NODE = false;
    HEALTHY = false;
    include "includes/birdwatch/*";
    return HEALTHY && !DRAIN_NODE;
}
```

That function is then included as the first term in relevant export
filters, e.g.:

```
filter PUBLIC_EXPORT_v4 {
    if !do_export() then reject;
# other export terms here
}
```

We have external programs that monitor for whether the host is indicated as
drained, or if the process that's handling traffic for the box is healthy.
If those conditions change, they add or remove config snippets at the `
includes/birdwatch/` directory to change the values of the DRAIN_NODE or
HEALTHY bools, and we then call `birdc configure` to reload the config and
pick up the change to either stop exports or proceed on with them. Doing
the common trick of adding/removing /32s from loopbacks or such for DIRECT
export don't fit our use case, so this method provides external programs
control over global export state for those filters.

This has worked cleanly for us so far. But, recently we've hit an issue.
Assuming we start in a "non-drained" state where DRAIN_NODE is false and
HEALTHY is true, so we're exporting routes as expected, we then:

* toggle DRAIN_NODE to be true
* execute `birdc configure` to reload the configuration

When we look at exported routes, e.g. `show route export <bgp peer
protocol>`, BIRD shows no routes exported, as expected. However, those BGP
peers definitely do still see routes advertised from BIRD. We are also
using https://github.com/czerwonk/bird_exporter, and the
`bird_protocol_prefix_export_count` metrics for exported routes for the
drained host remain at their previous levels. In other words:

* birdc says nothing is being exported
* but peers still are seeing prefixes from BIRD
* and bird_exporter still shows prefixes being exported from BIRD

Effectively, the `birdc show route export` output is lying.

We've tried adding a sleep between when the include snippet that changes
the DRAIN_NODE  value is written and when we hit `birdc configure`, but
that doesn't appear to make any difference. If we execute `birdc configure`
*twice*, though, everything's fine: The actual exports are stopped. That's
true without any sleep or break between running configure as well;
literally just `birdc configure` back to back in the script that manages
this.

We do not see any indication of issues in the `birdc configure` runs or in
BIRD's logs.

The reverse also does occur, where if we start with the DRAIN_NODE set to
true and are exporting nothing, then unset DRAIN_NODE and hit `birdc
configure`: `show route export` shows things being exported, but neighbours
actually don't hear any routes.

We have not seen this occur in our smaller dev environment, but it has
shown up in our more scaled environment. The relevant hosts have a couple
of upstream v4 and v6 peers feeding them defaults, with:

Import:
* 1x IPv4 default route per BGP upstream peer, 2x total
* 1x IPv6 default route per BGP upstream peer, 2x total
Export:
* 1-5 IPv4 /24s per BGP upstream peer
* 0-1 IPv6 /48s per BGP upstream peer

They have a larger number of downstream BGP peers, currently around 138 but
that number can fluctuate. Each of those peers have an IPv4 and IPv6
channel, running over an IPv6 transport (`extended next hop on` for the
ipv4 channel).

At the moment those have:

Import:
* 2x IPv6 routes per peer (one /48, one /128)
* 63x IPv4 routes per peer (all /24s)
Export:
* 21x IPv6 routes per peer (all /128s)
* 1-5x IPv6 routes per peer (all /24s)

So this does end up with the boxes have ~290 BGP "protocols", between v4
and v6 channels combined, ~9300 routes imported (between v4 and v6,
multiplied by peers), and ~3,100 routes exported (between v4 and v6,
multiplied by peers).

Any thoughts on what could be causing BIRD to fail to apply the policy
properly on the first `birdc configure` but succeeding on the second? Or
why the route export output is incorrect after the first `birdc configure`
but the metrics exporter being correct? Is `show route export` showing what
BIRD *thinks* should be exported based on current filters, or what it's
actively sending to those peers? Or any thoughts on how we can further
debug or trace this to figure out what's confusing BIRD?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20230302/7b18e7c2/attachment.htm>


More information about the Bird-users mailing list