BIRD continues exporting routes but reports no exports

Hugo Slabbert hugo.slabbert at menlosecurity.com
Fri Mar 3 16:15:15 CET 2023


(copying back to the list as this went direct inadvertently)

Gotcha, thanks.

Having export tables available is a great idea to help debug this, thanks.
We'll enable that on some dev nodes to see if we can validate things going
wonky and to give an additional view. The protocol reload is also a helpful
idea to probably use in conjunction with that, to validate how things look
before/after.

On Thu, Mar 2, 2023, 16:26 Alexander Zubkov <green at qrator.net> wrote:

> > Is `show route export` showing what BIRD thinks should be exported based
> on current filters, or what it's actively sending to those peers?
>
> It should be the first variant. So the problem not that config is read
> incorrectly, but that bird is not reloading protocols with changed filters.
> As if the "soft" configure was called. You probably could also workaround
> it by calling reload on your protocols.
>
> You can also enable export tables on your bgp session channels to examine
> what was actually sent to the neighbor.
>
> On Thu, Mar 2, 2023, 23:13 Hugo Slabbert via Bird-users <
> bird-users at network.cz> wrote:
>
>> Hi folks,
>>
>> We're seeing an issue where BIRD is reporting one thing for BGP route
>> exports (via `show route export <protocol>`) but actually doing
>> something different.
>>
>> We see this when we change some of the conditions around export. Our
>> export filters include a function that evaluates as either true or false,
>> for a global toggle of whether we should proceed on with any exports to
>> relevant peers. Roughly it looks like this:
>>
>> ```
>> function do_export ()
>> bool DRAIN_NODE;
>> bool HEALTHY;
>> {
>>     DRAIN_NODE = false;
>>     HEALTHY = false;
>>     include "includes/birdwatch/*";
>>     return HEALTHY && !DRAIN_NODE;
>> }
>> ```
>>
>> That function is then included as the first term in relevant export
>> filters, e.g.:
>>
>> ```
>> filter PUBLIC_EXPORT_v4 {
>>     if !do_export() then reject;
>> # other export terms here
>> }
>> ```
>>
>> We have external programs that monitor for whether the host is indicated
>> as drained, or if the process that's handling traffic for the box is
>> healthy. If those conditions change, they add or remove config snippets at
>> the `includes/birdwatch/` directory to change the values of the
>> DRAIN_NODE or HEALTHY bools, and we then call `birdc configure` to
>> reload the config and pick up the change to either stop exports or proceed
>> on with them. Doing the common trick of adding/removing /32s from loopbacks
>> or such for DIRECT export don't fit our use case, so this method provides
>> external programs control over global export state for those filters.
>>
>> This has worked cleanly for us so far. But, recently we've hit an issue.
>> Assuming we start in a "non-drained" state where DRAIN_NODE is false and
>> HEALTHY is true, so we're exporting routes as expected, we then:
>>
>> * toggle DRAIN_NODE to be true
>> * execute `birdc configure` to reload the configuration
>>
>> When we look at exported routes, e.g. `show route export <bgp peer
>> protocol>`, BIRD shows no routes exported, as expected. However, those
>> BGP peers definitely do still see routes advertised from BIRD. We are also
>> using https://github.com/czerwonk/bird_exporter, and the
>> `bird_protocol_prefix_export_count` metrics for exported routes for the
>> drained host remain at their previous levels. In other words:
>>
>> * birdc says nothing is being exported
>> * but peers still are seeing prefixes from BIRD
>> * and bird_exporter still shows prefixes being exported from BIRD
>>
>> Effectively, the `birdc show route export` output is lying.
>>
>> We've tried adding a sleep between when the include snippet that changes
>> the DRAIN_NODE  value is written and when we hit `birdc configure`, but
>> that doesn't appear to make any difference. If we execute `birdc
>> configure` *twice*, though, everything's fine: The actual exports are
>> stopped. That's true without any sleep or break between running configure
>> as well; literally just `birdc configure` back to back in the script
>> that manages this.
>>
>> We do not see any indication of issues in the `birdc configure` runs or
>> in BIRD's logs.
>>
>> The reverse also does occur, where if we start with the DRAIN_NODE set to
>> true and are exporting nothing, then unset DRAIN_NODE and hit `birdc
>> configure`: `show route export` shows things being exported, but
>> neighbours actually don't hear any routes.
>>
>> We have not seen this occur in our smaller dev environment, but it has
>> shown up in our more scaled environment. The relevant hosts have a couple
>> of upstream v4 and v6 peers feeding them defaults, with:
>>
>> Import:
>> * 1x IPv4 default route per BGP upstream peer, 2x total
>> * 1x IPv6 default route per BGP upstream peer, 2x total
>> Export:
>> * 1-5 IPv4 /24s per BGP upstream peer
>> * 0-1 IPv6 /48s per BGP upstream peer
>>
>> They have a larger number of downstream BGP peers, currently around 138
>> but that number can fluctuate. Each of those peers have an IPv4 and IPv6
>> channel, running over an IPv6 transport (`extended next hop on` for the
>> ipv4 channel).
>>
>> At the moment those have:
>>
>> Import:
>> * 2x IPv6 routes per peer (one /48, one /128)
>> * 63x IPv4 routes per peer (all /24s)
>> Export:
>> * 21x IPv6 routes per peer (all /128s)
>> * 1-5x IPv6 routes per peer (all /24s)
>>
>> So this does end up with the boxes have ~290 BGP "protocols", between v4
>> and v6 channels combined, ~9300 routes imported (between v4 and v6,
>> multiplied by peers), and ~3,100 routes exported (between v4 and v6,
>> multiplied by peers).
>>
>> Any thoughts on what could be causing BIRD to fail to apply the policy
>> properly on the first `birdc configure` but succeeding on the second? Or
>> why the route export output is incorrect after the first `birdc
>> configure` but the metrics exporter being correct? Is `show route export`
>> showing what BIRD *thinks* should be exported based on current filters,
>> or what it's actively sending to those peers? Or any thoughts on how we can
>> further debug or trace this to figure out what's confusing BIRD?
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20230303/27b49748/attachment.htm>


More information about the Bird-users mailing list