BIRD continues exporting routes but reports no exports
Hi folks, We're seeing an issue where BIRD is reporting one thing for BGP route exports (via `show route export <protocol>`) but actually doing something different. We see this when we change some of the conditions around export. Our export filters include a function that evaluates as either true or false, for a global toggle of whether we should proceed on with any exports to relevant peers. Roughly it looks like this: ``` function do_export () bool DRAIN_NODE; bool HEALTHY; { DRAIN_NODE = false; HEALTHY = false; include "includes/birdwatch/*"; return HEALTHY && !DRAIN_NODE; } ``` That function is then included as the first term in relevant export filters, e.g.: ``` filter PUBLIC_EXPORT_v4 { if !do_export() then reject; # other export terms here } ``` We have external programs that monitor for whether the host is indicated as drained, or if the process that's handling traffic for the box is healthy. If those conditions change, they add or remove config snippets at the ` includes/birdwatch/` directory to change the values of the DRAIN_NODE or HEALTHY bools, and we then call `birdc configure` to reload the config and pick up the change to either stop exports or proceed on with them. Doing the common trick of adding/removing /32s from loopbacks or such for DIRECT export don't fit our use case, so this method provides external programs control over global export state for those filters. This has worked cleanly for us so far. But, recently we've hit an issue. Assuming we start in a "non-drained" state where DRAIN_NODE is false and HEALTHY is true, so we're exporting routes as expected, we then: * toggle DRAIN_NODE to be true * execute `birdc configure` to reload the configuration When we look at exported routes, e.g. `show route export <bgp peer protocol>`, BIRD shows no routes exported, as expected. However, those BGP peers definitely do still see routes advertised from BIRD. We are also using https://github.com/czerwonk/bird_exporter, and the `bird_protocol_prefix_export_count` metrics for exported routes for the drained host remain at their previous levels. In other words: * birdc says nothing is being exported * but peers still are seeing prefixes from BIRD * and bird_exporter still shows prefixes being exported from BIRD Effectively, the `birdc show route export` output is lying. We've tried adding a sleep between when the include snippet that changes the DRAIN_NODE value is written and when we hit `birdc configure`, but that doesn't appear to make any difference. If we execute `birdc configure` *twice*, though, everything's fine: The actual exports are stopped. That's true without any sleep or break between running configure as well; literally just `birdc configure` back to back in the script that manages this. We do not see any indication of issues in the `birdc configure` runs or in BIRD's logs. The reverse also does occur, where if we start with the DRAIN_NODE set to true and are exporting nothing, then unset DRAIN_NODE and hit `birdc configure`: `show route export` shows things being exported, but neighbours actually don't hear any routes. We have not seen this occur in our smaller dev environment, but it has shown up in our more scaled environment. The relevant hosts have a couple of upstream v4 and v6 peers feeding them defaults, with: Import: * 1x IPv4 default route per BGP upstream peer, 2x total * 1x IPv6 default route per BGP upstream peer, 2x total Export: * 1-5 IPv4 /24s per BGP upstream peer * 0-1 IPv6 /48s per BGP upstream peer They have a larger number of downstream BGP peers, currently around 138 but that number can fluctuate. Each of those peers have an IPv4 and IPv6 channel, running over an IPv6 transport (`extended next hop on` for the ipv4 channel). At the moment those have: Import: * 2x IPv6 routes per peer (one /48, one /128) * 63x IPv4 routes per peer (all /24s) Export: * 21x IPv6 routes per peer (all /128s) * 1-5x IPv6 routes per peer (all /24s) So this does end up with the boxes have ~290 BGP "protocols", between v4 and v6 channels combined, ~9300 routes imported (between v4 and v6, multiplied by peers), and ~3,100 routes exported (between v4 and v6, multiplied by peers). Any thoughts on what could be causing BIRD to fail to apply the policy properly on the first `birdc configure` but succeeding on the second? Or why the route export output is incorrect after the first `birdc configure` but the metrics exporter being correct? Is `show route export` showing what BIRD *thinks* should be exported based on current filters, or what it's actively sending to those peers? Or any thoughts on how we can further debug or trace this to figure out what's confusing BIRD?
Hello!
We've tried adding a sleep between when the include snippet that changes the DRAIN_NODE value is written and when we hit `birdc configure`, but that doesn't appear to make any difference. If we execute `birdc configure` *twice*, though, everything's fine: The actual exports are stopped. That's true without any sleep or break between running configure as well; literally just `birdc configure` back to back in the script that manages this.
We do not see any indication of issues in the `birdc configure` runs or in BIRD's logs.
You are not disclosing the version of BIRD you are using. I vaguely remember that we fixed this kind of bug several years ago where changes in functions sometimes got ignored. Thus if you are not using a recent BIRD version, you are probably hitting that old bug. Maria
ah, right, apologies. bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1 Looks like 2.0.7 was released Oct 16 2019 (https://bird.network.cz/?download), so a fair chance we might be hitting this? It looks like something from 2.0.10 is available from the bullseye backports, with the most recent being 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate. ...where changes in functions sometimes got ignored. This might be reaching, but would that explain the difference between what's shown in route export status output versus what's actually being exported? On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < bird-users@network.cz> wrote:
Hello!
We've tried adding a sleep between when the include snippet that changes the DRAIN_NODE value is written and when we hit `birdc configure`, but that doesn't appear to make any difference. If we execute `birdc configure` *twice*, though, everything's fine: The actual exports are stopped. That's true without any sleep or break between running configure as well; literally just `birdc configure` back to back in the script that manages this.
We do not see any indication of issues in the `birdc configure` runs or in BIRD's logs.
You are not disclosing the version of BIRD you are using. I vaguely remember that we fixed this kind of bug several years ago where changes in functions sometimes got ignored.
Thus if you are not using a recent BIRD version, you are probably hitting that old bug.
Maria
Was this perhaps 3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> ? Filters: Function body comparison result now used.
Function bodies were compared in post-parse time, yet the result was not used and the functions were incorrectly considered the same as before.
Now the result is used to reload affected protocols. On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
ah, right, apologies.
bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1
Looks like 2.0.7 was released Oct 16 2019 ( https://bird.network.cz/?download), so a fair chance we might be hitting this? It looks like something from 2.0.10 is available from the bullseye backports, with the most recent being 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate.
...where changes in functions sometimes got ignored.
This might be reaching, but would that explain the difference between what's shown in route export status output versus what's actually being exported?
On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < bird-users@network.cz> wrote:
Hello!
We've tried adding a sleep between when the include snippet that changes the DRAIN_NODE value is written and when we hit `birdc configure`, but that doesn't appear to make any difference. If we execute `birdc configure` *twice*, though, everything's fine: The actual exports are stopped. That's true without any sleep or break between running configure as well; literally just `birdc configure` back to back in the script that manages this.
We do not see any indication of issues in the `birdc configure` runs or in BIRD's logs.
You are not disclosing the version of BIRD you are using. I vaguely remember that we fixed this kind of bug several years ago where changes in functions sometimes got ignored.
Thus if you are not using a recent BIRD version, you are probably hitting that old bug.
Maria
A slight update on this: 3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> does appear to be in 2.0.7, which we're running, so if that's the issue that may not be the problem. This looked to be successful initially when upgrading to 2.0.10. But, I then checked a box that was still running 2.0.7 and where we could repro it. I simply restarted bird there, and then could no longer repro it. So, just restarting bird at 2.0.7 was sufficient to clear the problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear test, given that's obviously a fresh instance of bird running. We'll try to validate if the problem eventually returns on the 2.0.7 box(es) after a restart, and if it does *not* return on the 2.0.10 instance, but we don't have a clear timeline at the moment on this if it's something that pops up "in a while" of bird running. On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
Was this perhaps 3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> ?
Filters: Function body comparison result now used.
Function bodies were compared in post-parse time, yet the result was not used and the functions were incorrectly considered the same as before.
Now the result is used to reload affected protocols.
On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
ah, right, apologies.
bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1
Looks like 2.0.7 was released Oct 16 2019 ( https://bird.network.cz/?download), so a fair chance we might be hitting this? It looks like something from 2.0.10 is available from the bullseye backports, with the most recent being 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate.
...where changes in functions sometimes got ignored.
This might be reaching, but would that explain the difference between what's shown in route export status output versus what's actually being exported?
On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < bird-users@network.cz> wrote:
Hello!
We've tried adding a sleep between when the include snippet that changes the DRAIN_NODE value is written and when we hit `birdc configure`, but that doesn't appear to make any difference. If we execute `birdc configure` *twice*, though, everything's fine: The actual exports are stopped. That's true without any sleep or break between running configure as well; literally just `birdc configure` back to back in the script that manages this.
We do not see any indication of issues in the `birdc configure` runs or in BIRD's logs.
You are not disclosing the version of BIRD you are using. I vaguely remember that we fixed this kind of bug several years ago where changes in functions sometimes got ignored.
Thus if you are not using a recent BIRD version, you are probably hitting that old bug.
Maria
Right, so, I've gone ahead and enabled export tables on the channels for the relevant peers, per Alexander's suggestion for possibly getting additional visibility. I don't seem to spot any different views for route status, though. I don't see any particular docs on how to *view* export tables; does enabling export tables make a different view available to look at the export table contents specifically? Or does it just shift the behaviour so `show route export <protocol>` displays the export table contents rather than a point-in-time evaluation of export filters for the specified neighbor? Snippet showing export tables config enabled for the peers: ``` template bgp GATEWAY_v6 { hold time 6; startup hold time 20; connect delay time 3; connect retry time 6; error wait time 3, 12; med metric; allow local as 1; local fdff::4:2 as MENLO_ASN; ipv6 { export table on; import filter GATEWAY_IMPORT_v6; export filter GATEWAY_EXPORT_v6; }; ipv4 { export table on; extended next hop on; add paths rx; import filter GATEWAY_IMPORT_v4; export filter GATEWAY_EXPORT_v4; }; } # ... protocol bgp gw_085ea85_euwest2 from GATEWAY_v6 { neighbor fdff::8005:f4d1 as 65000; } ``` I don't see any different behaviour on the affected hosts, though. E.g. a host that just had `configure` called once after setting the draining flag is showing these symptoms, showing nothing for `show route export <protocol>`: ``` bird> show route export gw_085ea85_euwest2 bird> ``` ...but still showing exports under the protocol details: ``` bird> show protocols all gw_085ea85_euwest2 Name Proto Table State Since Info gw_085ea85_euwest2 BGP --- up 2023-03-03 16:33:43 Established BGP state: Established # ... Channel ipv6 State: UP Table: master6 Preference: 100 Input filter: GATEWAY_IMPORT_v6 Output filter: GATEWAY_EXPORT_v6 Routes: 2 imported, 2 exported, 1 preferred Route change stats: received rejected filtered ignored accepted Import updates: 3 0 1 0 2 Import withdraws: 0 0 --- 0 0 Export updates: 109 5 96 --- 8 Export withdraws: 2 --- --- --- 2 BGP Next hop: fdff::4:2 Channel ipv4 State: UP Table: master4 Preference: 100 Input filter: GATEWAY_IMPORT_v4 Output filter: GATEWAY_EXPORT_v4 Routes: 12 imported, 1 exported, 0 preferred Route change stats: received rejected filtered ignored accepted Import updates: 12 0 0 0 12 Import withdraws: 0 0 --- 0 0 Export updates: 39 4 31 --- 4 Export withdraws: 0 --- --- --- 1 BGP Next hop: fdff::4:2 ``` Note this is still on 2.0.7. We've bumped some hosts to 2.0.10, but as indicated in the previous message, just a simple restart clears this issue from occurring. We've enabled the export table config on both a 2.0.7 and a 2.0.10 host, to be able to possibly spot if this reoccurs on the 2.0.10 host as well after a period. An example host on 2.0.7 showing this behaviour has been up for ~2 weeks. The box upgraded to 2.0.10 has had BIRD running for just ~16 hours at this point and is not yet showing any issues. On Thu, Mar 2, 2023 at 4:07 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
A slight update on this:
3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> does appear to be in 2.0.7, which we're running, so if that's the issue that may not be the problem.
This looked to be successful initially when upgrading to 2.0.10. But, I then checked a box that was still running 2.0.7 and where we could repro it. I simply restarted bird there, and then could no longer repro it.
So, just restarting bird at 2.0.7 was sufficient to clear the problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear test, given that's obviously a fresh instance of bird running.
We'll try to validate if the problem eventually returns on the 2.0.7 box(es) after a restart, and if it does *not* return on the 2.0.10 instance, but we don't have a clear timeline at the moment on this if it's something that pops up "in a while" of bird running.
On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
Was this perhaps 3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> ?
Filters: Function body comparison result now used.
Function bodies were compared in post-parse time, yet the result was not used and the functions were incorrectly considered the same as before.
Now the result is used to reload affected protocols.
On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
ah, right, apologies.
bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1
Looks like 2.0.7 was released Oct 16 2019 ( https://bird.network.cz/?download), so a fair chance we might be hitting this? It looks like something from 2.0.10 is available from the bullseye backports, with the most recent being 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate.
...where changes in functions sometimes got ignored.
This might be reaching, but would that explain the difference between what's shown in route export status output versus what's actually being exported?
On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < bird-users@network.cz> wrote:
Hello!
We've tried adding a sleep between when the include snippet that changes the DRAIN_NODE value is written and when we hit `birdc configure`, but that doesn't appear to make any difference. If we execute `birdc configure` *twice*, though, everything's fine: The actual exports are stopped. That's true without any sleep or break between running configure as well; literally just `birdc configure` back to back in the script that manages this.
We do not see any indication of issues in the `birdc configure` runs or in BIRD's logs.
You are not disclosing the version of BIRD you are using. I vaguely remember that we fixed this kind of bug several years ago where changes in functions sometimes got ignored.
Thus if you are not using a recent BIRD version, you are probably hitting that old bug.
Maria
Hi, It is documented in recent versions and on the bird's site too. Pay attention to this: [(import|export) table p.c] On Fri, Mar 3, 2023, 18:32 Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Right, so,
I've gone ahead and enabled export tables on the channels for the relevant peers, per Alexander's suggestion for possibly getting additional visibility. I don't seem to spot any different views for route status, though. I don't see any particular docs on how to *view* export tables; does enabling export tables make a different view available to look at the export table contents specifically? Or does it just shift the behaviour so `show route export <protocol>` displays the export table contents rather than a point-in-time evaluation of export filters for the specified neighbor?
Snippet showing export tables config enabled for the peers:
``` template bgp GATEWAY_v6 { hold time 6; startup hold time 20; connect delay time 3; connect retry time 6; error wait time 3, 12; med metric; allow local as 1;
local fdff::4:2 as MENLO_ASN;
ipv6 { export table on; import filter GATEWAY_IMPORT_v6; export filter GATEWAY_EXPORT_v6; }; ipv4 { export table on; extended next hop on; add paths rx; import filter GATEWAY_IMPORT_v4; export filter GATEWAY_EXPORT_v4; }; } # ... protocol bgp gw_085ea85_euwest2 from GATEWAY_v6 { neighbor fdff::8005:f4d1 as 65000; } ```
I don't see any different behaviour on the affected hosts, though. E.g. a host that just had `configure` called once after setting the draining flag is showing these symptoms, showing nothing for `show route export <protocol>`:
``` bird> show route export gw_085ea85_euwest2 bird> ```
...but still showing exports under the protocol details:
``` bird> show protocols all gw_085ea85_euwest2 Name Proto Table State Since Info gw_085ea85_euwest2 BGP --- up 2023-03-03 16:33:43 Established BGP state: Established # ... Channel ipv6 State: UP Table: master6 Preference: 100 Input filter: GATEWAY_IMPORT_v6 Output filter: GATEWAY_EXPORT_v6 Routes: 2 imported, 2 exported, 1 preferred Route change stats: received rejected filtered ignored accepted Import updates: 3 0 1 0 2 Import withdraws: 0 0 --- 0 0 Export updates: 109 5 96 --- 8 Export withdraws: 2 --- --- --- 2 BGP Next hop: fdff::4:2 Channel ipv4 State: UP Table: master4 Preference: 100 Input filter: GATEWAY_IMPORT_v4 Output filter: GATEWAY_EXPORT_v4 Routes: 12 imported, 1 exported, 0 preferred Route change stats: received rejected filtered ignored accepted Import updates: 12 0 0 0 12 Import withdraws: 0 0 --- 0 0 Export updates: 39 4 31 --- 4 Export withdraws: 0 --- --- --- 1 BGP Next hop: fdff::4:2 ```
Note this is still on 2.0.7. We've bumped some hosts to 2.0.10, but as indicated in the previous message, just a simple restart clears this issue from occurring. We've enabled the export table config on both a 2.0.7 and a 2.0.10 host, to be able to possibly spot if this reoccurs on the 2.0.10 host as well after a period. An example host on 2.0.7 showing this behaviour has been up for ~2 weeks. The box upgraded to 2.0.10 has had BIRD running for just ~16 hours at this point and is not yet showing any issues.
On Thu, Mar 2, 2023 at 4:07 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
A slight update on this:
3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> does appear to be in 2.0.7, which we're running, so if that's the issue that may not be the problem.
This looked to be successful initially when upgrading to 2.0.10. But, I then checked a box that was still running 2.0.7 and where we could repro it. I simply restarted bird there, and then could no longer repro it.
So, just restarting bird at 2.0.7 was sufficient to clear the problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear test, given that's obviously a fresh instance of bird running.
We'll try to validate if the problem eventually returns on the 2.0.7 box(es) after a restart, and if it does *not* return on the 2.0.10 instance, but we don't have a clear timeline at the moment on this if it's something that pops up "in a while" of bird running.
On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
Was this perhaps 3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> ?
Filters: Function body comparison result now used.
Function bodies were compared in post-parse time, yet the result was not used and the functions were incorrectly considered the same as before.
Now the result is used to reload affected protocols.
On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
ah, right, apologies.
bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1
Looks like 2.0.7 was released Oct 16 2019 ( https://bird.network.cz/?download), so a fair chance we might be hitting this? It looks like something from 2.0.10 is available from the bullseye backports, with the most recent being 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate.
...where changes in functions sometimes got ignored.
This might be reaching, but would that explain the difference between what's shown in route export status output versus what's actually being exported?
On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < bird-users@network.cz> wrote:
Hello!
We've tried adding a sleep between when the include snippet that changes the DRAIN_NODE value is written and when we hit `birdc configure`, but that doesn't appear to make any difference. If we execute `birdc configure` *twice*, though, everything's fine: The actual exports are stopped. That's true without any sleep or break between running configure as well; literally just `birdc configure` back to back in the script that manages this.
We do not see any indication of issues in the `birdc configure` runs or in BIRD's logs.
You are not disclosing the version of BIRD you are using. I vaguely remember that we fixed this kind of bug several years ago where changes in functions sometimes got ignored.
Thus if you are not using a recent BIRD version, you are probably hitting that old bug.
Maria
Ah; thanks. Okay, I was misreading that as just referring to regular table filtering, not in conjunction with import/export. I had looked at `show symbols table` and not seen any indication of it, but missed that these are present in the `show route export table <p.c>` format regardless. Thanks. That confirms that we do in fact see a difference there between the export table and the ad hoc route export view when this occurs, after a single call to `birdc configure` (scrubbed slightly here): ``` bird> show route export gw_085ea85_euwest2 bird> bird> show route export table gw_085ea85_euwest2.ipv4 Table export: 57.140.1.0/24 unicast [<name of static source> 16:41:41.058] * (100) via <next hop> on bond0 onlink ``` We'll keep an eye here and validate if we do see this returning on 2.0.10 as well, or if 2.0.10 remains clear. On Fri, Mar 3, 2023 at 10:18 AM Alexander Zubkov <green@qrator.net> wrote:
Hi,
It is documented in recent versions and on the bird's site too. Pay attention to this:
[(import|export) table p.c]
On Fri, Mar 3, 2023, 18:32 Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Right, so,
I've gone ahead and enabled export tables on the channels for the relevant peers, per Alexander's suggestion for possibly getting additional visibility. I don't seem to spot any different views for route status, though. I don't see any particular docs on how to *view* export tables; does enabling export tables make a different view available to look at the export table contents specifically? Or does it just shift the behaviour so `show route export <protocol>` displays the export table contents rather than a point-in-time evaluation of export filters for the specified neighbor?
Snippet showing export tables config enabled for the peers:
``` template bgp GATEWAY_v6 { hold time 6; startup hold time 20; connect delay time 3; connect retry time 6; error wait time 3, 12; med metric; allow local as 1;
local fdff::4:2 as MENLO_ASN;
ipv6 { export table on; import filter GATEWAY_IMPORT_v6; export filter GATEWAY_EXPORT_v6; }; ipv4 { export table on; extended next hop on; add paths rx; import filter GATEWAY_IMPORT_v4; export filter GATEWAY_EXPORT_v4; }; } # ... protocol bgp gw_085ea85_euwest2 from GATEWAY_v6 { neighbor fdff::8005:f4d1 as 65000; } ```
I don't see any different behaviour on the affected hosts, though. E.g. a host that just had `configure` called once after setting the draining flag is showing these symptoms, showing nothing for `show route export <protocol>`:
``` bird> show route export gw_085ea85_euwest2 bird> ```
...but still showing exports under the protocol details:
``` bird> show protocols all gw_085ea85_euwest2 Name Proto Table State Since Info gw_085ea85_euwest2 BGP --- up 2023-03-03 16:33:43 Established BGP state: Established # ... Channel ipv6 State: UP Table: master6 Preference: 100 Input filter: GATEWAY_IMPORT_v6 Output filter: GATEWAY_EXPORT_v6 Routes: 2 imported, 2 exported, 1 preferred Route change stats: received rejected filtered ignored accepted Import updates: 3 0 1 0 2 Import withdraws: 0 0 --- 0 0 Export updates: 109 5 96 --- 8 Export withdraws: 2 --- --- --- 2 BGP Next hop: fdff::4:2 Channel ipv4 State: UP Table: master4 Preference: 100 Input filter: GATEWAY_IMPORT_v4 Output filter: GATEWAY_EXPORT_v4 Routes: 12 imported, 1 exported, 0 preferred Route change stats: received rejected filtered ignored accepted Import updates: 12 0 0 0 12 Import withdraws: 0 0 --- 0 0 Export updates: 39 4 31 --- 4 Export withdraws: 0 --- --- --- 1 BGP Next hop: fdff::4:2 ```
Note this is still on 2.0.7. We've bumped some hosts to 2.0.10, but as indicated in the previous message, just a simple restart clears this issue from occurring. We've enabled the export table config on both a 2.0.7 and a 2.0.10 host, to be able to possibly spot if this reoccurs on the 2.0.10 host as well after a period. An example host on 2.0.7 showing this behaviour has been up for ~2 weeks. The box upgraded to 2.0.10 has had BIRD running for just ~16 hours at this point and is not yet showing any issues.
On Thu, Mar 2, 2023 at 4:07 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
A slight update on this:
3f477ccb <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> does appear to be in 2.0.7, which we're running, so if that's the issue that may not be the problem.
This looked to be successful initially when upgrading to 2.0.10. But, I then checked a box that was still running 2.0.7 and where we could repro it. I simply restarted bird there, and then could no longer repro it.
So, just restarting bird at 2.0.7 was sufficient to clear the problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear test, given that's obviously a fresh instance of bird running.
We'll try to validate if the problem eventually returns on the 2.0.7 box(es) after a restart, and if it does *not* return on the 2.0.10 instance, but we don't have a clear timeline at the moment on this if it's something that pops up "in a while" of bird running.
On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
Was this perhaps 3f477ccb <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> ?
Filters: Function body comparison result now used.
Function bodies were compared in post-parse time, yet the result was not used and the functions were incorrectly considered the same as before.
Now the result is used to reload affected protocols.
On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
ah, right, apologies.
bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1
Looks like 2.0.7 was released Oct 16 2019 ( https://bird.network.cz/?download <https://isolate-menlo.menlosecurity.com/0/eJyrViotylGyUsooKSkottLXT8osStHLSy0pzy_K1kuu0rdPyS_Py8lPTFHSUSrKV7Iy1FEqyUwBajA0sTRXqgUAsmAUQw>), so a fair chance we might be hitting this? It looks like something from 2.0.10 is available from the bullseye backports, with the most recent being 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate.
...where changes in functions sometimes got ignored.
This might be reaching, but would that explain the difference between what's shown in route export status output versus what's actually being exported?
On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < bird-users@network.cz> wrote:
Hello!
> We've tried adding a sleep between when the include snippet that changes > the DRAIN_NODE value is written and when we hit `birdc configure`, but > that doesn't appear to make any difference. If we execute `birdc > configure` *twice*, though, everything's fine: The actual exports are > stopped. That's true without any sleep or break between running > configure as well; literally just `birdc configure` back to back in the > script that manages this. > > We do not see any indication of issues in the `birdc configure` runs or > in BIRD's logs.
You are not disclosing the version of BIRD you are using. I vaguely remember that we fixed this kind of bug several years ago where changes in functions sometimes got ignored.
Thus if you are not using a recent BIRD version, you are probably hitting that old bug.
Maria
By the way - compiling the most recent version of bird is very easy. So even though there's not a package for 2.0.12 for bullseye, I recommend just compiling with the same options as the bird package and running that. On Fri, Mar 3, 2023, 2:07 PM Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Ah; thanks. Okay, I was misreading that as just referring to regular table filtering, not in conjunction with import/export. I had looked at `show symbols table` and not seen any indication of it, but missed that these are present in the `show route export table <p.c>` format regardless.
Thanks. That confirms that we do in fact see a difference there between the export table and the ad hoc route export view when this occurs, after a single call to `birdc configure` (scrubbed slightly here):
``` bird> show route export gw_085ea85_euwest2 bird> bird> show route export table gw_085ea85_euwest2.ipv4 Table export: 57.140.1.0/24 unicast [<name of static source> 16:41:41.058] * (100) via <next hop> on bond0 onlink ```
We'll keep an eye here and validate if we do see this returning on 2.0.10 as well, or if 2.0.10 remains clear.
On Fri, Mar 3, 2023 at 10:18 AM Alexander Zubkov <green@qrator.net> wrote:
Hi,
It is documented in recent versions and on the bird's site too. Pay attention to this:
[(import|export) table p.c]
On Fri, Mar 3, 2023, 18:32 Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Right, so,
I've gone ahead and enabled export tables on the channels for the relevant peers, per Alexander's suggestion for possibly getting additional visibility. I don't seem to spot any different views for route status, though. I don't see any particular docs on how to *view* export tables; does enabling export tables make a different view available to look at the export table contents specifically? Or does it just shift the behaviour so `show route export <protocol>` displays the export table contents rather than a point-in-time evaluation of export filters for the specified neighbor?
Snippet showing export tables config enabled for the peers:
``` template bgp GATEWAY_v6 { hold time 6; startup hold time 20; connect delay time 3; connect retry time 6; error wait time 3, 12; med metric; allow local as 1;
local fdff::4:2 as MENLO_ASN;
ipv6 { export table on; import filter GATEWAY_IMPORT_v6; export filter GATEWAY_EXPORT_v6; }; ipv4 { export table on; extended next hop on; add paths rx; import filter GATEWAY_IMPORT_v4; export filter GATEWAY_EXPORT_v4; }; } # ... protocol bgp gw_085ea85_euwest2 from GATEWAY_v6 { neighbor fdff::8005:f4d1 as 65000; } ```
I don't see any different behaviour on the affected hosts, though. E.g. a host that just had `configure` called once after setting the draining flag is showing these symptoms, showing nothing for `show route export <protocol>`:
``` bird> show route export gw_085ea85_euwest2 bird> ```
...but still showing exports under the protocol details:
``` bird> show protocols all gw_085ea85_euwest2 Name Proto Table State Since Info gw_085ea85_euwest2 BGP --- up 2023-03-03 16:33:43 Established BGP state: Established # ... Channel ipv6 State: UP Table: master6 Preference: 100 Input filter: GATEWAY_IMPORT_v6 Output filter: GATEWAY_EXPORT_v6 Routes: 2 imported, 2 exported, 1 preferred Route change stats: received rejected filtered ignored accepted Import updates: 3 0 1 0 2 Import withdraws: 0 0 --- 0 0 Export updates: 109 5 96 --- 8 Export withdraws: 2 --- --- --- 2 BGP Next hop: fdff::4:2 Channel ipv4 State: UP Table: master4 Preference: 100 Input filter: GATEWAY_IMPORT_v4 Output filter: GATEWAY_EXPORT_v4 Routes: 12 imported, 1 exported, 0 preferred Route change stats: received rejected filtered ignored accepted Import updates: 12 0 0 0 12 Import withdraws: 0 0 --- 0 0 Export updates: 39 4 31 --- 4 Export withdraws: 0 --- --- --- 1 BGP Next hop: fdff::4:2 ```
Note this is still on 2.0.7. We've bumped some hosts to 2.0.10, but as indicated in the previous message, just a simple restart clears this issue from occurring. We've enabled the export table config on both a 2.0.7 and a 2.0.10 host, to be able to possibly spot if this reoccurs on the 2.0.10 host as well after a period. An example host on 2.0.7 showing this behaviour has been up for ~2 weeks. The box upgraded to 2.0.10 has had BIRD running for just ~16 hours at this point and is not yet showing any issues.
On Thu, Mar 2, 2023 at 4:07 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
A slight update on this:
3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> does appear to be in 2.0.7, which we're running, so if that's the issue that may not be the problem.
This looked to be successful initially when upgrading to 2.0.10. But, I then checked a box that was still running 2.0.7 and where we could repro it. I simply restarted bird there, and then could no longer repro it.
So, just restarting bird at 2.0.7 was sufficient to clear the problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear test, given that's obviously a fresh instance of bird running.
We'll try to validate if the problem eventually returns on the 2.0.7 box(es) after a restart, and if it does *not* return on the 2.0.10 instance, but we don't have a clear timeline at the moment on this if it's something that pops up "in a while" of bird running.
On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
Was this perhaps 3f477ccb <https://gitlab.nic.cz/labs/bird/-/commit/3f477ccb03ed99cf6754baaca179fcf791bcda55> ?
Filters: Function body comparison result now used.
Function bodies were compared in post-parse time, yet the result was not used and the functions were incorrectly considered the same as before.
Now the result is used to reload affected protocols.
On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
ah, right, apologies.
bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1
Looks like 2.0.7 was released Oct 16 2019 ( https://bird.network.cz/?download), so a fair chance we might be hitting this? It looks like something from 2.0.10 is available from the bullseye backports, with the most recent being 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate.
...where changes in functions sometimes got ignored.
This might be reaching, but would that explain the difference between what's shown in route export status output versus what's actually being exported?
On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < bird-users@network.cz> wrote:
> Hello! > > > > We've tried adding a sleep between when the include snippet that > changes > > the DRAIN_NODE value is written and when we hit `birdc > configure`, but > > that doesn't appear to make any difference. If we execute `birdc > > configure` *twice*, though, everything's fine: The actual exports > are > > stopped. That's true without any sleep or break between running > > configure as well; literally just `birdc configure` back to back > in the > > script that manages this. > > > > We do not see any indication of issues in the `birdc configure` > runs or > > in BIRD's logs. > > You are not disclosing the version of BIRD you are using. I vaguely > remember that we fixed this kind of bug several years ago where > changes > in functions sometimes got ignored. > > Thus if you are not using a recent BIRD version, you are probably > hitting that old bug. > > Maria >
*nods* We generally run distro packages unless we have specific requirements for newer features or fixes etc. Current experiments show that the test routers on 2.0.7 started to reproduce this issue after about 3 days since BIRD's process start, whereas the test boxes we have on 2.0.10 have not yet reproduced the issue after about 3 days and 5 days, respectively. It's always tough to prove a negative, but we'll keep it running through the weekend to validate and look at bumping to 2.0.10 across the fleet if we're still clear on the 2.0.10 boxes at that point. On Wed, Mar 8, 2023 at 2:17 PM Ross Tajvar <ross@tajvar.io> wrote:
By the way - compiling the most recent version of bird is very easy. So even though there's not a package for 2.0.12 for bullseye, I recommend just compiling with the same options as the bird package and running that.
On Fri, Mar 3, 2023, 2:07 PM Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Ah; thanks. Okay, I was misreading that as just referring to regular table filtering, not in conjunction with import/export. I had looked at `show symbols table` and not seen any indication of it, but missed that these are present in the `show route export table <p.c>` format regardless.
Thanks. That confirms that we do in fact see a difference there between the export table and the ad hoc route export view when this occurs, after a single call to `birdc configure` (scrubbed slightly here):
``` bird> show route export gw_085ea85_euwest2 bird> bird> show route export table gw_085ea85_euwest2.ipv4 Table export: 57.140.1.0/24 <https://isolate-menlo.menlosecurity.com/0/eJyrViotylGyUsooKSmw0tc3NdczNDHQM9Qz0DcyUdJRKspXsjLUUSrJTAGqMTSxNFeqBQBU6gyn> unicast [<name of static source> 16:41:41.058] * (100) via <next hop> on bond0 onlink ```
We'll keep an eye here and validate if we do see this returning on 2.0.10 as well, or if 2.0.10 remains clear.
On Fri, Mar 3, 2023 at 10:18 AM Alexander Zubkov <green@qrator.net> wrote:
Hi,
It is documented in recent versions and on the bird's site too. Pay attention to this:
[(import|export) table p.c]
On Fri, Mar 3, 2023, 18:32 Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Right, so,
I've gone ahead and enabled export tables on the channels for the relevant peers, per Alexander's suggestion for possibly getting additional visibility. I don't seem to spot any different views for route status, though. I don't see any particular docs on how to *view* export tables; does enabling export tables make a different view available to look at the export table contents specifically? Or does it just shift the behaviour so `show route export <protocol>` displays the export table contents rather than a point-in-time evaluation of export filters for the specified neighbor?
Snippet showing export tables config enabled for the peers:
``` template bgp GATEWAY_v6 { hold time 6; startup hold time 20; connect delay time 3; connect retry time 6; error wait time 3, 12; med metric; allow local as 1;
local fdff::4:2 as MENLO_ASN;
ipv6 { export table on; import filter GATEWAY_IMPORT_v6; export filter GATEWAY_EXPORT_v6; }; ipv4 { export table on; extended next hop on; add paths rx; import filter GATEWAY_IMPORT_v4; export filter GATEWAY_EXPORT_v4; }; } # ... protocol bgp gw_085ea85_euwest2 from GATEWAY_v6 { neighbor fdff::8005:f4d1 as 65000; } ```
I don't see any different behaviour on the affected hosts, though. E.g. a host that just had `configure` called once after setting the draining flag is showing these symptoms, showing nothing for `show route export <protocol>`:
``` bird> show route export gw_085ea85_euwest2 bird> ```
...but still showing exports under the protocol details:
``` bird> show protocols all gw_085ea85_euwest2 Name Proto Table State Since Info gw_085ea85_euwest2 BGP --- up 2023-03-03 16:33:43 Established BGP state: Established # ... Channel ipv6 State: UP Table: master6 Preference: 100 Input filter: GATEWAY_IMPORT_v6 Output filter: GATEWAY_EXPORT_v6 Routes: 2 imported, 2 exported, 1 preferred Route change stats: received rejected filtered ignored accepted Import updates: 3 0 1 0 2 Import withdraws: 0 0 --- 0 0 Export updates: 109 5 96 --- 8 Export withdraws: 2 --- --- --- 2 BGP Next hop: fdff::4:2 Channel ipv4 State: UP Table: master4 Preference: 100 Input filter: GATEWAY_IMPORT_v4 Output filter: GATEWAY_EXPORT_v4 Routes: 12 imported, 1 exported, 0 preferred Route change stats: received rejected filtered ignored accepted Import updates: 12 0 0 0 12 Import withdraws: 0 0 --- 0 0 Export updates: 39 4 31 --- 4 Export withdraws: 0 --- --- --- 1 BGP Next hop: fdff::4:2 ```
Note this is still on 2.0.7. We've bumped some hosts to 2.0.10, but as indicated in the previous message, just a simple restart clears this issue from occurring. We've enabled the export table config on both a 2.0.7 and a 2.0.10 host, to be able to possibly spot if this reoccurs on the 2.0.10 host as well after a period. An example host on 2.0.7 showing this behaviour has been up for ~2 weeks. The box upgraded to 2.0.10 has had BIRD running for just ~16 hours at this point and is not yet showing any issues.
On Thu, Mar 2, 2023 at 4:07 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
A slight update on this:
3f477ccb <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> does appear to be in 2.0.7, which we're running, so if that's the issue that may not be the problem.
This looked to be successful initially when upgrading to 2.0.10. But, I then checked a box that was still running 2.0.7 and where we could repro it. I simply restarted bird there, and then could no longer repro it.
So, just restarting bird at 2.0.7 was sufficient to clear the problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear test, given that's obviously a fresh instance of bird running.
We'll try to validate if the problem eventually returns on the 2.0.7 box(es) after a restart, and if it does *not* return on the 2.0.10 instance, but we don't have a clear timeline at the moment on this if it's something that pops up "in a while" of bird running.
On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
Was this perhaps 3f477ccb <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> ?
Filters: Function body comparison result now used. > Function bodies were compared in post-parse time, yet the result was > not > used and the functions were incorrectly considered the same as > before.
Now the result is used to reload affected protocols.
On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
> ah, right, apologies. > > bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1 > > Looks like 2.0.7 was released Oct 16 2019 ( > https://bird.network.cz/?download > <https://isolate-menlo.menlosecurity.com/0/eJyrViotylGyUsooKSkottLXT8osStHLSy0pzy_K1kuu0rdPyS_Py8lPTFHSUSrKV7Iy1FEqyUwBajA0sTRXqgUAsmAUQw>), > so a fair chance we might be hitting this? It looks like something from > 2.0.10 is available from the bullseye backports, with the most recent being > 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate. > > ...where changes in functions sometimes got ignored. > > > This might be reaching, but would that explain the difference > between what's shown in route export status output versus what's actually > being exported? > > On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < > bird-users@network.cz> wrote: > >> Hello! >> >> >> > We've tried adding a sleep between when the include snippet that >> changes >> > the DRAIN_NODE value is written and when we hit `birdc >> configure`, but >> > that doesn't appear to make any difference. If we execute `birdc >> > configure` *twice*, though, everything's fine: The actual exports >> are >> > stopped. That's true without any sleep or break between running >> > configure as well; literally just `birdc configure` back to back >> in the >> > script that manages this. >> > >> > We do not see any indication of issues in the `birdc configure` >> runs or >> > in BIRD's logs. >> >> You are not disclosing the version of BIRD you are using. I vaguely >> remember that we fixed this kind of bug several years ago where >> changes >> in functions sometimes got ignored. >> >> Thus if you are not using a recent BIRD version, you are probably >> hitting that old bug. >> >> Maria >> >
Closing the loop on this: A bit over a week in, any of the boxes on 2.0.7 consistently reproduce the issue (need `configure` issued twice to actually pick up the export policy changes) whereas the test boxes on 2.0.10 are still consistently behaving properly (single `configure` properly updates exports). On Wed, Mar 8, 2023 at 2:20 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
*nods*
We generally run distro packages unless we have specific requirements for newer features or fixes etc.
Current experiments show that the test routers on 2.0.7 started to reproduce this issue after about 3 days since BIRD's process start, whereas the test boxes we have on 2.0.10 have not yet reproduced the issue after about 3 days and 5 days, respectively.
It's always tough to prove a negative, but we'll keep it running through the weekend to validate and look at bumping to 2.0.10 across the fleet if we're still clear on the 2.0.10 boxes at that point.
On Wed, Mar 8, 2023 at 2:17 PM Ross Tajvar <ross@tajvar.io> wrote:
By the way - compiling the most recent version of bird is very easy. So even though there's not a package for 2.0.12 for bullseye, I recommend just compiling with the same options as the bird package and running that.
On Fri, Mar 3, 2023, 2:07 PM Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Ah; thanks. Okay, I was misreading that as just referring to regular table filtering, not in conjunction with import/export. I had looked at `show symbols table` and not seen any indication of it, but missed that these are present in the `show route export table <p.c>` format regardless.
Thanks. That confirms that we do in fact see a difference there between the export table and the ad hoc route export view when this occurs, after a single call to `birdc configure` (scrubbed slightly here):
``` bird> show route export gw_085ea85_euwest2 bird> bird> show route export table gw_085ea85_euwest2.ipv4 Table export: 57.140.1.0/24 <https://isolate-menlo.menlosecurity.com/0/eJyrViotylGyUsooKSmw0tc3NdczNDHQM9Qz0DcyUdJRKspXsjLUUSrJTAGqMTSxNFeqBQBU6gyn> unicast [<name of static source> 16:41:41.058] * (100) via <next hop> on bond0 onlink ```
We'll keep an eye here and validate if we do see this returning on 2.0.10 as well, or if 2.0.10 remains clear.
On Fri, Mar 3, 2023 at 10:18 AM Alexander Zubkov <green@qrator.net> wrote:
Hi,
It is documented in recent versions and on the bird's site too. Pay attention to this:
[(import|export) table p.c]
On Fri, Mar 3, 2023, 18:32 Hugo Slabbert via Bird-users < bird-users@network.cz> wrote:
Right, so,
I've gone ahead and enabled export tables on the channels for the relevant peers, per Alexander's suggestion for possibly getting additional visibility. I don't seem to spot any different views for route status, though. I don't see any particular docs on how to *view* export tables; does enabling export tables make a different view available to look at the export table contents specifically? Or does it just shift the behaviour so `show route export <protocol>` displays the export table contents rather than a point-in-time evaluation of export filters for the specified neighbor?
Snippet showing export tables config enabled for the peers:
``` template bgp GATEWAY_v6 { hold time 6; startup hold time 20; connect delay time 3; connect retry time 6; error wait time 3, 12; med metric; allow local as 1;
local fdff::4:2 as MENLO_ASN;
ipv6 { export table on; import filter GATEWAY_IMPORT_v6; export filter GATEWAY_EXPORT_v6; }; ipv4 { export table on; extended next hop on; add paths rx; import filter GATEWAY_IMPORT_v4; export filter GATEWAY_EXPORT_v4; }; } # ... protocol bgp gw_085ea85_euwest2 from GATEWAY_v6 { neighbor fdff::8005:f4d1 as 65000; } ```
I don't see any different behaviour on the affected hosts, though. E.g. a host that just had `configure` called once after setting the draining flag is showing these symptoms, showing nothing for `show route export <protocol>`:
``` bird> show route export gw_085ea85_euwest2 bird> ```
...but still showing exports under the protocol details:
``` bird> show protocols all gw_085ea85_euwest2 Name Proto Table State Since Info gw_085ea85_euwest2 BGP --- up 2023-03-03 16:33:43 Established BGP state: Established # ... Channel ipv6 State: UP Table: master6 Preference: 100 Input filter: GATEWAY_IMPORT_v6 Output filter: GATEWAY_EXPORT_v6 Routes: 2 imported, 2 exported, 1 preferred Route change stats: received rejected filtered ignored accepted Import updates: 3 0 1 0 2 Import withdraws: 0 0 --- 0 0 Export updates: 109 5 96 --- 8 Export withdraws: 2 --- --- --- 2 BGP Next hop: fdff::4:2 Channel ipv4 State: UP Table: master4 Preference: 100 Input filter: GATEWAY_IMPORT_v4 Output filter: GATEWAY_EXPORT_v4 Routes: 12 imported, 1 exported, 0 preferred Route change stats: received rejected filtered ignored accepted Import updates: 12 0 0 0 12 Import withdraws: 0 0 --- 0 0 Export updates: 39 4 31 --- 4 Export withdraws: 0 --- --- --- 1 BGP Next hop: fdff::4:2 ```
Note this is still on 2.0.7. We've bumped some hosts to 2.0.10, but as indicated in the previous message, just a simple restart clears this issue from occurring. We've enabled the export table config on both a 2.0.7 and a 2.0.10 host, to be able to possibly spot if this reoccurs on the 2.0.10 host as well after a period. An example host on 2.0.7 showing this behaviour has been up for ~2 weeks. The box upgraded to 2.0.10 has had BIRD running for just ~16 hours at this point and is not yet showing any issues.
On Thu, Mar 2, 2023 at 4:07 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
A slight update on this:
3f477ccb <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> does appear to be in 2.0.7, which we're running, so if that's the issue that may not be the problem.
This looked to be successful initially when upgrading to 2.0.10. But, I then checked a box that was still running 2.0.7 and where we could repro it. I simply restarted bird there, and then could no longer repro it.
So, just restarting bird at 2.0.7 was sufficient to clear the problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear test, given that's obviously a fresh instance of bird running.
We'll try to validate if the problem eventually returns on the 2.0.7 box(es) after a restart, and if it does *not* return on the 2.0.10 instance, but we don't have a clear timeline at the moment on this if it's something that pops up "in a while" of bird running.
On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < hugo.slabbert@menlosecurity.com> wrote:
> Was this perhaps 3f477ccb > <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> > ? > > Filters: Function body comparison result now used. >> Function bodies were compared in post-parse time, yet the result >> was not >> used and the functions were incorrectly considered the same as >> before. > > > > Now the result is used to reload affected protocols. > > > On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < > hugo.slabbert@menlosecurity.com> wrote: > >> ah, right, apologies. >> >> bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1 >> >> Looks like 2.0.7 was released Oct 16 2019 ( >> https://bird.network.cz/?download >> <https://isolate-menlo.menlosecurity.com/0/eJyrViotylGyUsooKSkottLXT8osStHLSy0pzy_K1kuu0rdPyS_Py8lPTFHSUSrKV7Iy1FEqyUwBajA0sTRXqgUAsmAUQw>), >> so a fair chance we might be hitting this? It looks like something from >> 2.0.10 is available from the bullseye backports, with the most recent being >> 2.0.12 in bookworm or sid. I'll look at pulling one of those in to validate. >> >> ...where changes in functions sometimes got ignored. >> >> >> This might be reaching, but would that explain the difference >> between what's shown in route export status output versus what's actually >> being exported? >> >> On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < >> bird-users@network.cz> wrote: >> >>> Hello! >>> >>> >>> > We've tried adding a sleep between when the include snippet that >>> changes >>> > the DRAIN_NODE value is written and when we hit `birdc >>> configure`, but >>> > that doesn't appear to make any difference. If we execute `birdc >>> > configure` *twice*, though, everything's fine: The actual >>> exports are >>> > stopped. That's true without any sleep or break between running >>> > configure as well; literally just `birdc configure` back to back >>> in the >>> > script that manages this. >>> > >>> > We do not see any indication of issues in the `birdc configure` >>> runs or >>> > in BIRD's logs. >>> >>> You are not disclosing the version of BIRD you are using. I >>> vaguely >>> remember that we fixed this kind of bug several years ago where >>> changes >>> in functions sometimes got ignored. >>> >>> Thus if you are not using a recent BIRD version, you are probably >>> hitting that old bug. >>> >>> Maria >>> >>
participants (4)
-
Alexander Zubkov -
Hugo Slabbert -
Maria Matejka -
Ross Tajvar