Q: rpki feed failure
Hi friends, a probably simple question that I haven't found any documentation about, so... We're using Cloudflare's RPKI cache from our BIRD instances, but since yesterday I'm getting an error info in `show protocols`: rpki1 RPKI --- up 15:33:17.594 Cache-Error-No-Data-Available What does the error message mean? Does the remote side not have data available, or is this about some local issue? I'd like to clear this up before I ask Cloudflare... Thanks, Elmar.
hello my friend, all the peers lose their connection at once, at different times, the server of 1500,000 networks and 3000000 routes, where to look for what to adjust, I ask for help, the server itself does not fall, the service does not fall either ] Feb 27 05:02:45 bgp-01 bird: bgpMSKIX: Error: Hold timer expired Feb 27 05:03:29 bgp-01 bird: bgpRFET2: Error: Hold timer expired Feb 27 05:03:36 bgp-01 bird: bgpGLFW: Error: Hold timer expired Feb 27 05:03:40 bgp-01 bird: Kernel dropped some netlink messages, will resync on next scan. Feb 27 05:03:45 bgp-01 bird: bgpKOMP: Error: Hold timer expired Feb 27 05:03:58 bgp-01 bird: I/O loop cycle took 5094 ms for 6 events Feb 27 05:04:29 bgp-01 bird: bgpZap: Error: Hold timer expired Feb 27 05:04:44 bgp-01 bird: bgpSAL: Received: Hold timer expired Feb 27 05:04:56 bgp-01 bird: bgpRKN2: Received: Hold timer expired Feb 27 05:04:56 bgp-01 bird: bgpFIT: Error: Hold timer expired Feb 27 05:05:02 bgp-01 bird: bgpGLIX1: Error: Hold timer expired Feb 27 05:05:05 bgp-01 bird: Kernel dropped some netlink messages, will resync on next scan. Feb 27 05:05:16 bgp-01 bird: bgpRFET1: Received: Hold timer expired Feb 27 05:05:19 bgp-01 bird: bgpRKN1: Received: Hold timer expired
Hello! You probably hit the architectural limits of single thread routing. Try BIRD 3 (alpha). Also with this load, it's very much recommended to have a BIRD Support package to have us booked for resolving high load problems. It's a time-consuming and very fiddly job for experienced developers. Please see https://bird.nic.cz/en/commercial-services/ for more details, and/or contact me off-list for a customized quote. Happy routing! Maria On 29 February 2024 10:08:56 CET, "mx.avanttel.ru via Bird-users" <bird-users@network.cz> wrote:
hello my friend, all the peers lose their connection at once, at different times, the server of 1500,000 networks and 3000000 routes, where to look for what to adjust, I ask for help, the server itself does not fall, the service does not fall either
] Feb 27 05:02:45 bgp-01 bird: bgpMSKIX: Error: Hold timer expired Feb 27 05:03:29 bgp-01 bird: bgpRFET2: Error: Hold timer expired Feb 27 05:03:36 bgp-01 bird: bgpGLFW: Error: Hold timer expired Feb 27 05:03:40 bgp-01 bird: Kernel dropped some netlink messages, will resync on next scan. Feb 27 05:03:45 bgp-01 bird: bgpKOMP: Error: Hold timer expired Feb 27 05:03:58 bgp-01 bird: I/O loop cycle took 5094 ms for 6 events Feb 27 05:04:29 bgp-01 bird: bgpZap: Error: Hold timer expired Feb 27 05:04:44 bgp-01 bird: bgpSAL: Received: Hold timer expired Feb 27 05:04:56 bgp-01 bird: bgpRKN2: Received: Hold timer expired Feb 27 05:04:56 bgp-01 bird: bgpFIT: Error: Hold timer expired Feb 27 05:05:02 bgp-01 bird: bgpGLIX1: Error: Hold timer expired Feb 27 05:05:05 bgp-01 bird: Kernel dropped some netlink messages, will resync on next scan. Feb 27 05:05:16 bgp-01 bird: bgpRFET1: Received: Hold timer expired Feb 27 05:05:19 bgp-01 bird: bgpRKN1: Received: Hold timer expired
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hi Elmar Is using a public RPKI Cache for production networks relay a good idea? It's very easy to run your own with eg. Routinator 3000 https://nlnetlabs.nl/projects/routing/routinator/ Regards Matthias On 29/02/2024 09:48, Elmar K. Bins via Bird-users wrote:
Hi friends,
a probably simple question that I haven't found any documentation about, so...
We're using Cloudflare's RPKI cache from our BIRD instances, but since yesterday I'm getting an error info in `show protocols`:
rpki1 RPKI --- up 15:33:17.594 Cache-Error-No-Data-Available
What does the error message mean? Does the remote side not have data available, or is this about some local issue?
I'd like to clear this up before I ask Cloudflare...
Thanks, Elmar.
Re Matthias, bird-users@network.cz (Matthias Cramer via Bird-users) wrote:
Is using a public RPKI Cache for production networks relay a good idea? It's very easy to run your own with eg. Routinator 3000 https://nlnetlabs.nl/projects/routing/routinator/
Perhaps it's time for that, but... that would run centrally, and the nodes are literally all over the planet. So connectivity is still an issue. Thanks, Elmar.
On 29 Feb 2024, at 09:48, Elmar K. Bins via Bird-users <bird-users@network.cz> wrote:
Hi friends,
a probably simple question that I haven't found any documentation about, so...
We're using Cloudflare's RPKI cache from our BIRD instances, but since yesterday I'm getting an error info in `show protocols`:
rpki1 RPKI --- up 15:33:17.594 Cache-Error-No-Data-Available
What does the error message mean? Does the remote side not have data available, or is this about some local issue?
Checking the code: https://gitlab.nic.cz/labs/bird/-/blob/master/proto/rpki/rpki.c#L279 Seems like it.
I'd like to clear this up before I ask Cloudflare...
Instead heavily suggest running 2 stayrtr instances yourself. What if you lose connectivity to the outside world? For AS57777 I generate IRR data rpki-client and rsync rpki-client.json every 10 minutes to my 2 stayrtr instances. If the rpki-client hosts goes down, then they do not update the rpki-client.json as there is no new one, but stayrtr stays up and running. If I upgrade stayrtr, there is always the second one up and running. Scale to more nodes for more resiliency :). On top I run another set of stayrtr instances for IRR data, need to write that down still how that setup now finally works. Greets, Jeroen
Re folks,
rpki1 RPKI --- up 15:33:17.594 Cache-Error-No-Data-Available
Cloudflare fixed their part, so the immediate problem has gone away. In the meantime I've evaluated several implementations and settled on FORT (courtesy of our mexican friends), which appears do work quite nicely (and is trivial to set up). Luckily, I don't have to cope with a complicated setup like Jeroen's ;-) May4 etc... Elmar.
participants (5)
-
Elmar K. Bins -
Jeroen Massar -
Maria Matejka -
Matthias Cramer -
mx.avanttel.ru