Hello, I am currently trying bird out in a route collector scenario. We have around 50 devices all sending around 1.2M routes. I initially started with bird 2.0.12, but the CPU it stuck at 100%, and the debug logs has a lot of "channel reload burst split (max_feed=-1) ». So I wanted to try bird 3.0, but I am getting the following logs (using the -d flag), and the router crashes just after starting: bird: Started bird: Trying to lock in a bad order Aborted Any idea what could be the issue there ? Here is my config: timeformat base iso long; timeformat log iso long; timeformat protocol iso long; timeformat route iso long; router id X.X.X.X; hostname "route-collector"; attribute int roa_status1; attribute int roa_status2; roa4 table roa4_1; roa4 table roa4_2; roa6 table roa6_1; roa6 table roa6_2; ipv4 table pb4 ipv6 table pb6; filter flag_rpki { if bgp_path.len = 0 || bgp_path.last = 16276 then accept; if roa_check(roa4_1, net, bgp_path.last) = ROA_INVALID then roa_status1=1; if roa_check(roa4_1, net, bgp_path.last) = ROA_UNKNOWN then roa_status1=2; if roa_check(roa4_1, net, bgp_path.last) = ROA_VALID then roa_status1=3; if roa_check(roa4_2, net, bgp_path.last) = ROA_INVALID then roa_status2=1; if roa_check(roa4_2, net, bgp_path.last) = ROA_UNKNOWN then roa_status2=2; if roa_check(roa4_2, net, bgp_path.last) = ROA_VALID then roa_status2=3; accept; } protocol bgp PB { local X.X.X.X as 16276; neighbor range 0.0.0.0/0 as 16276; dynamic name "pb"; dynamic name digits 2; ipv4 { export filter { reject; }; table pb4; import filter flag_rpki; add paths rx; import table yes; next hop keep on; rpki reload on; }; ipv6 { export filter { reject; }; table pb6; import filter flag_rpki; add paths rx; import table yes; next hop keep on; rpki reload on; }; strict bind on; } protocol rpki stack1 { roa4 { table roa4_1; }; roa6 { table roa6_1; }; remote X.X.X.Z port 323; transport tcp; refresh 300; retry 300; expire 600; } protocol rpki stack2 { roa4 { table roa4_2; }; roa6 { table roa6_2; }; remote X.X.X.Y port 323; transport tcp; refresh 300; retry 300; expire 600; } Best regards, François
Hello, Francois! Regarding 2.0.12, you just need to wait, loading 1.2M routes from 50 peers running 6x roa_check each is simply slow on one CPU. You may ignore the debug log, there is typically nothing useful. Also thank you for noting that the "reload burst split" message is still there; I forgot to convert it to a L_TRACE log. Also you should probably use the roa_check in a switch-case syntax, effectively reducing the number of calls from 6 to 2. It's the most resource-demanding thing inside your filter. Also you may want to postpone BGP startup by several seconds to let the RPKI feed its table completely, avoiding otherwise necessary filter recalculaton. (The exact delay must be determined locally.) The "bad lock order" bug is known in 3.0-alpha0 and has no simple solution and we ended up rewriting lots of other things. There is no newer version to test for now. Thanks to our Support Subscribers, we can afford a testing hardware good enough to test these scenarios systematically. Thus 3.0-alpha1 will be able to handle 60M routes fast and safely, and we're going to release it soon. Please consider subscribing to make it possible for us to test BIRD also for your scenarios; for more information, contact us at bird-support@network.cz. Have a nice day! Maria On 4/4/23 11:24, Francois Espinet wrote:
Hello,
I am currently trying bird out in a route collector scenario. We have around 50 devices all sending around 1.2M routes.
I initially started with bird 2.0.12, but the CPU it stuck at 100%, and the debug logs has a lot of "channel reload burst split (max_feed=-1) ».
So I wanted to try bird 3.0, but I am getting the following logs (using the -d flag), and the router crashes just after starting:
bird: Started bird: Trying to lock in a bad order Aborted
Any idea what could be the issue there ?
Here is my config:
timeformat base iso long; timeformat log iso long; timeformat protocol iso long; timeformat route iso long;
router id X.X.X.X; hostname "route-collector";
attribute int roa_status1; attribute int roa_status2; roa4 table roa4_1; roa4 table roa4_2; roa6 table roa6_1; roa6 table roa6_2;
ipv4 table pb4 ipv6 table pb6;
filter flag_rpki { if bgp_path.len = 0 || bgp_path.last = 16276 then accept;
if roa_check(roa4_1, net, bgp_path.last) = ROA_INVALID then roa_status1=1; if roa_check(roa4_1, net, bgp_path.last) = ROA_UNKNOWN then roa_status1=2; if roa_check(roa4_1, net, bgp_path.last) = ROA_VALID then roa_status1=3;
if roa_check(roa4_2, net, bgp_path.last) = ROA_INVALID then roa_status2=1; if roa_check(roa4_2, net, bgp_path.last) = ROA_UNKNOWN then roa_status2=2; if roa_check(roa4_2, net, bgp_path.last) = ROA_VALID then roa_status2=3; accept; }
protocol bgp PB { local X.X.X.X as 16276; neighbor range 0.0.0.0/0 as 16276; dynamic name "pb"; dynamic name digits 2; ipv4 { export filter { reject; }; table pb4; import filter flag_rpki; add paths rx; import table yes; next hop keep on; rpki reload on; }; ipv6 { export filter { reject; }; table pb6; import filter flag_rpki; add paths rx; import table yes; next hop keep on; rpki reload on; }; strict bind on; }
protocol rpki stack1 { roa4 { table roa4_1; }; roa6 { table roa6_1; }; remote X.X.X.Z port 323; transport tcp; refresh 300; retry 300; expire 600; }
protocol rpki stack2 { roa4 { table roa4_2; }; roa6 { table roa6_2; }; remote X.X.X.Y port 323; transport tcp; refresh 300; retry 300; expire 600; }
Best regards, François
I am curious to see the benefits of bird 3.0 compared to 2.x. I don't see anything posted on https://bird.network.cz/. Is bird 3.0 in beta phase? This is my first time using bird so I am contemplating if I should upgrade to bird 3.0 or stick with bird 2.x.
Hello Jesse, For me the main advantage of Bird 3.0 is that it will support multicore for BGP sessions. Currently (both Bird 1.6 and 2.0) process BGP messages on a single core, which can be a performance bottleneck. The last time when I got new routers I picked routers with a very high single core process speed, not a lot of cores. Specifically for this reason. But going to multicore with Bird would be great! Currently Bird 3.0 is in alpha state, specifically 3.0alpha0 is the current release, they are working on a second 3.0 alpha release, called 3.0alpha1. So, it is all still in early development. Running it in production is thus not recommended, but testing with it will be appreciated by the Bird project. Kind regards, Cybertinus On 04-04-2023 23:37, Jesse Mac Dougall via Bird-users wrote:
I am curious to see the benefits of bird 3.0 compared to 2.x. I don't see anything posted on https://bird.network.cz/. Is bird 3.0 in beta phase?
This is my first time using bird so I am contemplating if I should upgrade to bird 3.0 or stick with bird 2.x.
I recommend reading these posts: https://en.blog.nic.cz/2021/03/23/bird-journey-to-threads-chapter-1-the-rout... https://en.blog.nic.cz/2021/06/14/bird-journey-to-threads-chapter-2-asynchro... https://en.blog.nic.cz/2022/02/09/bird-journey-to-threads-chapter-3-parallel... https://en.blog.nic.cz/2022/10/17/bird-journey-to-threads-chapter-4-memory-a... P.S.: @Maria Matejka <maria.matejka@nic.cz> It would be nice if we could have an update on how things are going. 😁 Em ter., 4 de abr. de 2023 às 18:42, Jesse Mac Dougall via Bird-users < bird-users@network.cz> escreveu:
I am curious to see the benefits of bird 3.0 compared to 2.x. I don't see anything posted on https://bird.network.cz/. Is bird 3.0 in beta phase?
This is my first time using bird so I am contemplating if I should upgrade to bird 3.0 or stick with bird 2.x.
-- Douglas Fernando Fischer Engº de Controle e Automação
Hello! On 4/5/23 13:14, Douglas Fischer wrote:
P.S.: @Maria Matejka <mailto:maria.matejka@nic.cz> It would be nice if we could have an update on how things are going. 😁
Oh yes, I'm now trying to find the cause why BIRD in my test of graceful restart + bfd sometimes fails to restart bfd sessions in a strange way. All other things seem to work quite nicely, some performance tests and performance-related fixups are planned and then 3.0-alpha1 should be out. Thank you for your patience. M.
Em ter., 4 de abr. de 2023 às 18:42, Jesse Mac Dougall via Bird-users <bird-users@network.cz <mailto:bird-users@network.cz>> escreveu:
I am curious to see the benefits of bird 3.0 compared to 2.x. I don't see anything posted on https://bird.network.cz/ <https://bird.network.cz/>. Is bird 3.0 in beta phase?
This is my first time using bird so I am contemplating if I should upgrade to bird 3.0 or stick with bird 2.x.
-- Douglas Fernando Fischer Engº de Controle e Automação
Thanks for all the info. I was able to confirm that the RPKI refresh was causing my CPU issue. When I remove it the CPU does settle after a few minutes (around 15-20 mins). Looking forward to testing more of bird3! Best regards, François
Le 4 avr. 2023 à 12:23, Maria Matejka via Bird-users <bird-users@network.cz> a écrit :
Hello, Francois!
Regarding 2.0.12, you just need to wait, loading 1.2M routes from 50 peers running 6x roa_check each is simply slow on one CPU. You may ignore the debug log, there is typically nothing useful. Also thank you for noting that the "reload burst split" message is still there; I forgot to convert it to a L_TRACE log.
Also you should probably use the roa_check in a switch-case syntax, effectively reducing the number of calls from 6 to 2. It's the most resource-demanding thing inside your filter.
Also you may want to postpone BGP startup by several seconds to let the RPKI feed its table completely, avoiding otherwise necessary filter recalculaton. (The exact delay must be determined locally.)
The "bad lock order" bug is known in 3.0-alpha0 and has no simple solution and we ended up rewriting lots of other things. There is no newer version to test for now.
Thanks to our Support Subscribers, we can afford a testing hardware good enough to test these scenarios systematically. Thus 3.0-alpha1 will be able to handle 60M routes fast and safely, and we're going to release it soon.
Please consider subscribing to make it possible for us to test BIRD also for your scenarios; for more information, contact us at bird-support@network.cz.
Have a nice day! Maria
On 4/4/23 11:24, Francois Espinet wrote:
Hello, I am currently trying bird out in a route collector scenario. We have around 50 devices all sending around 1.2M routes. I initially started with bird 2.0.12, but the CPU it stuck at 100%, and the debug logs has a lot of "channel reload burst split (max_feed=-1) ». So I wanted to try bird 3.0, but I am getting the following logs (using the -d flag), and the router crashes just after starting:
bird: Started bird: Trying to lock in a bad order Aborted Any idea what could be the issue there ? Here is my config: timeformat base iso long; timeformat log iso long; timeformat protocol iso long; timeformat route iso long;
router id X.X.X.X; hostname "route-collector";
attribute int roa_status1; attribute int roa_status2; roa4 table roa4_1; roa4 table roa4_2; roa6 table roa6_1; roa6 table roa6_2;
ipv4 table pb4 ipv6 table pb6;
filter flag_rpki { if bgp_path.len = 0 || bgp_path.last = 16276 then accept;
if roa_check(roa4_1, net, bgp_path.last) = ROA_INVALID then roa_status1=1; if roa_check(roa4_1, net, bgp_path.last) = ROA_UNKNOWN then roa_status1=2; if roa_check(roa4_1, net, bgp_path.last) = ROA_VALID then roa_status1=3;
if roa_check(roa4_2, net, bgp_path.last) = ROA_INVALID then roa_status2=1; if roa_check(roa4_2, net, bgp_path.last) = ROA_UNKNOWN then roa_status2=2; if roa_check(roa4_2, net, bgp_path.last) = ROA_VALID then roa_status2=3; accept; }
protocol bgp PB { local X.X.X.X as 16276; neighbor range 0.0.0.0/0 as 16276; dynamic name "pb"; dynamic name digits 2; ipv4 { export filter { reject; }; table pb4; import filter flag_rpki; add paths rx; import table yes; next hop keep on; rpki reload on; }; ipv6 { export filter { reject; }; table pb6; import filter flag_rpki; add paths rx; import table yes; next hop keep on; rpki reload on; }; strict bind on; }
protocol rpki stack1 { roa4 { table roa4_1; }; roa6 { table roa6_1; }; remote X.X.X.Z port 323; transport tcp; refresh 300; retry 300; expire 600; }
protocol rpki stack2 { roa4 { table roa4_2; }; roa6 { table roa6_2; }; remote X.X.X.Y port 323; transport tcp; refresh 300; retry 300; expire 600; } Best regards, François
participants (5)
-
Cybertinus -
Douglas Fischer -
Francois Espinet -
Maria Matejka -
support@orcacomputers.com