Way to store ROA info so we can accept but view?
Hey all, We're using RPKI in testing at the day job, and for a given route, it seems the best we can do is apply a community to it so we can see that it's invalid. A howto I've found says that this is a bad idea and can cause problems. (https://bgpfilterguide.nlnog.net/guides/reject_invalids/) When you look at routes with something like "show route all", there's no field for the RPKI status or the ASes for which ROAs are allowed. So, the questions here is: 1) My understanding of the way RPKI-RTR works is that it's basically handed a tuple of prefix and AS, and RTR says "valid", "invalid", or "unknown". It feels like to check for AS 0 ROAs, we'd basically have to do two lookups for each route that's otherwise invalid, which feels inefficient. Is there a better way? 2) Can the output of "show route" be extended to include user defined fields, or are we locked into what's there? 3) If not, we're limited to adding communities or MEDs or local prefs or something like that, which is a hack, but at least gives us some info we can view. Is that a dangerous trade off? -Dan -- --------Dan Mahoney-------- Techie, Sysadmin, WebGeek Gushi on efnet/undernet IRC FB: fb.com/DanielMahoneyIV LI: linkedin.com/in/gushi Site: http://www.gushi.org ---------------------------
Hello!
So, the questions here is:
1) My understanding of the way RPKI-RTR works is that it's basically handed a tuple of prefix and AS, and RTR says "valid", "invalid", or "unknown". It feels like to check for AS 0 ROAs, we'd basically have to do two lookups for each route that's otherwise invalid, which feels inefficient. Is there a better way?
Probably not. Anyway, there are some plans to look into roa check efficiency, there are also going to be some aspa / as cones checks and more so we have to make it fast.
2) Can the output of "show route" be extended to include user defined fields, or are we locked into what's there?
3) If not, we're limited to adding communities or MEDs or local prefs or something like that, which is a hack, but at least gives us some info we can view. Is that a dangerous trade off?
You can declare and use your own route attributes exactly for this: https://bird.network.cz/?get_doc&v=20&f=bird-3.html#opt-attribute Maria
On May 28, 2022, at 10:34 AM, Maria Matejka <maria.matejka@nic.cz> wrote:
Hello!
So, the questions here is: 1) My understanding of the way RPKI-RTR works is that it's basically handed a tuple of prefix and AS, and RTR says "valid", "invalid", or "unknown". It feels like to check for AS 0 ROAs, we'd basically have to do two lookups for each route that's otherwise invalid, which feels inefficient. Is there a better way?
Probably not. Anyway, there are some plans to look into roa check efficiency, there are also going to be some aspa / as cones checks and more so we have to make it fast.
2) Can the output of "show route" be extended to include user defined fields, or are we locked into what's there? 3) If not, we're limited to adding communities or MEDs or local prefs or something like that, which is a hack, but at least gives us some info we can view. Is that a dangerous trade off?
You can declare and use your own route attributes exactly for this: https://bird.network.cz/?get_doc&v=20&f=bird-3.html#opt-attribute
This is awesome. Given the : attribute type name I assume the type can be any of the ones here? https://bird.network.cz/?get_doc&v=20&f=bird-5.html#ss5.2 <https://bird.network.cz/?get_doc&v=20&f=bird-5.html#ss5.2> -Dan
Hello! On 5/28/22 8:46 PM, Dan Mahoney wrote:
On May 28, 2022, at 10:34 AM, Maria Matejka <maria.matejka@nic.cz> wrote:
You can declare and use your own route attributes exactly for this: https://bird.network.cz/?get_doc&v=20&f=bird-3.html#opt-attribute <https://bird.network.cz/?get_doc&v=20&f=bird-3.html#opt-attribute>
This is awesome. Given the :
|attribute /type/ /name/|
I assume the type can be any of the ones here?
Not completely, as for version 2. Supported are: int, ip, quad, bgppath, clist, eclist, lclist yet I suppose you are creative enough to encode virtually anything to these. In future releases of version 3, there will be more supported types. Maria
That made me curious... "Note: REALLY DONT store the validation state inside a bgp_community or bgp_large_community or bgp_ext_community variables. It can cause CPU & memory overload resulting in convergence performance issues." Why that ( CPU & memory overload ) would happen? Why is that different from a lookup against a Prefix List? Em sáb., 28 de mai. de 2022 às 09:57, Dan Mahoney (Gushi) < danm@prime.gushi.org> escreveu:
Hey all,
We're using RPKI in testing at the day job, and for a given route, it seems the best we can do is apply a community to it so we can see that it's invalid.
A howto I've found says that this is a bad idea and can cause problems. (https://bgpfilterguide.nlnog.net/guides/reject_invalids/)
When you look at routes with something like "show route all", there's no field for the RPKI status or the ASes for which ROAs are allowed.
So, the questions here is:
1) My understanding of the way RPKI-RTR works is that it's basically handed a tuple of prefix and AS, and RTR says "valid", "invalid", or "unknown". It feels like to check for AS 0 ROAs, we'd basically have to do two lookups for each route that's otherwise invalid, which feels inefficient. Is there a better way?
2) Can the output of "show route" be extended to include user defined fields, or are we locked into what's there?
3) If not, we're limited to adding communities or MEDs or local prefs or something like that, which is a hack, but at least gives us some info we can view. Is that a dangerous trade off?
-Dan
--
--------Dan Mahoney-------- Techie, Sysadmin, WebGeek Gushi on efnet/undernet IRC FB: fb.com/DanielMahoneyIV LI: linkedin.com/in/gushi Site: http://www.gushi.org ---------------------------
-- Douglas Fernando Fischer Engº de Controle e Automação
On May 30, 2022, at 8:04 AM, Douglas Fischer <fischerdouglas@gmail.com> wrote:
That made me curious...
"Note: REALLY DONT store the validation state inside a bgp_community or bgp_large_community or bgp_ext_community variables. It can cause CPU & memory overload resulting in convergence performance issues."
Why that ( CPU & memory overload ) would happen? Why is that different from a lookup against a Prefix List?
Prefix lists are on-device only. As are the attributes I was asking about. Communities...aren't. Unless you're insanely careful to strip them, these are passed along to peers and cause reconvegence issues and recalculation issues down the chain. "It is considered harmful to manipulate BGP Path Attributes (for example LOCAL_PREF or COMMUNITY) based on the RPKI Origin Validation state. Making BGP Path Attributes dependent on RPKI Validation states introduces needless brittleness in the global routing system as explained here. Additionally, the use of RFC 8097 is STRONGLY ABSOLUTELY NOT RECOMMENDED. RFC 8097 has caused issues for multi-vendor network operators." (Since this is a plain text mail, I'll expand that link) https://mailarchive.ietf.org/arch/msg/sidrops/dwQi9lgYKRVctdlMAHhtgYkzhSM/ -Dan
Humm... I think that I'm "insanely careful"! hahaha. I try to use in every consultant customer that accepts(not all of them accepts that level of insanity) the concept of Internal communities and external communities. - Internal being marked on the eBGP-In, and propagated only thought my iBGP and striped on eBGP-Out. I think that I understood the part of being harmful to the rest of the world. But my worry is more related to the impact of that (marking the state of RPKI on the routes using communities) in my own instance of BIRD. On the point of view of day-to-day operations, keep the rejected routes and tag then with communities to be presented and interpreted on a Looking Glass is very pedagogic to Transit Customers that asks: "Why my routes are being rejected?" Actually, I try to do that(tag with internal communities) with good or bad for every check that I do... Ex.: - Prefix Bogons - ASN Bogons - Tier 1 Free - RPKI Em seg., 30 de mai. de 2022 às 09:15, Dan Mahoney <danm@prime.gushi.org> escreveu:
On May 30, 2022, at 8:04 AM, Douglas Fischer <fischerdouglas@gmail.com> wrote:
That made me curious...
"Note: REALLY DONT store the validation state inside a bgp_community or bgp_large_community or bgp_ext_community variables. It can cause CPU & memory overload resulting in convergence performance issues."
Why that ( CPU & memory overload ) would happen? Why is that different from a lookup against a Prefix List?
Prefix lists are on-device only. As are the attributes I was asking about. Communities...aren't.
Unless you're insanely careful to strip them, these are passed along to peers and cause reconvegence issues and recalculation issues down the chain.
"It is considered harmful to manipulate BGP Path Attributes (for example LOCAL_PREF or COMMUNITY) based on the RPKI Origin Validation state. Making BGP Path Attributes dependent on RPKI Validation states introduces needless brittleness in the global routing system as explained here. Additionally, the use of RFC 8097 is STRONGLY ABSOLUTELY NOT RECOMMENDED. RFC 8097 has caused issues for multi-vendor network operators."
(Since this is a plain text mail, I'll expand that link)
https://mailarchive.ietf.org/arch/msg/sidrops/dwQi9lgYKRVctdlMAHhtgYkzhSM/
-Dan
-- Douglas Fernando Fischer Engº de Controle e Automação
Hi Douglas, On Mon, May 30, 2022 at 09:38:44AM -0300, Douglas Fischer wrote:
On the point of view of day-to-day operations, keep the rejected routes and tag then with communities to be presented and interpreted on a Looking Glass is very pedagogic to Transit Customers that asks: "Why my routes are being rejected?"
Actually, I try to do that(tag with internal communities) with good or bad for every check that I do... Ex.: - Prefix Bogons - ASN Bogons - Tier 1 Free - RPKI
Rejecting a route *and* tagging it with a community is not what causes problems: because you are *rejecting* the route (for example because bogon, or rpki-invalid), there is no routing churn problem further downstream. The problem Dan Mahoney writes about is when you attach a BGP community to "valid" or "not-found" routes: if your validator/RTR server ever has some kind of issue (for example when it crashes), all "valid" routes would flip to "not-found" state, causing BGP churn for 37%+ of routes in a full table view. Of course, after the crashed validator restarts (comes back online), those hundreds of thousands of routes *again* require new BGP UPDATE messages to remove the "not-found" and attach the "valid" community. In short: * Reject RPKI-invalid routes (optionally using the BIRD trick to attach a community to a rejected route) * Do NOT attach communities to routes that are "valid" or "not-found" merely because they are valid/not-found. Does the above description make sense? Kind regards, Job
On Mon, May 30, 2022 at 02:52:21PM +0200, Job Snijders wrote:
Hi Douglas,
Rejecting a route *and* tagging it with a community is not what causes problems: because you are *rejecting* the route (for example because bogon, or rpki-invalid), there is no routing churn problem further downstream.
The problem Dan Mahoney writes about is when you attach a BGP community to "valid" or "not-found" routes: if your validator/RTR server ever has some kind of issue (for example when it crashes), all "valid" routes would flip to "not-found" state, causing BGP churn for 37%+ of routes in a full table view. Of course, after the crashed validator restarts (comes back online), those hundreds of thousands of routes *again* require new BGP UPDATE messages to remove the "not-found" and attach the "valid" community.
In short:
* Reject RPKI-invalid routes (optionally using the BIRD trick to attach a community to a rejected route) * Do NOT attach communities to routes that are "valid" or "not-found" merely because they are valid/not-found.
Does the above description make sense?
Hi I think that important point here is that if your RPKI infrastructure is OK, you cannot have two routes for one prefix where one is 'valid' and the other is 'not-found' (because the prefix is either covered leading to 'valid' or 'invalid', or not leading to 'not-found'), so for routing purposes the distinction between 'valid' and 'not-found' is irrelevant. If your RPKI infrastructure has some consistency issues (say one RTR server crashed that is used by half the border routers, while other half still doing ok, or perhaps something less dramatic like some border routers have received BGP routes from peers, but not yet loaded RPKI records from cache), then there is a point in marking 'valid' routes distinctly from 'not-found' routes: If one border router receives invalid route, but due to RPKI issues mark it as 'not-found', while some other border router receives a valid route and mark it as 'valid' (does not matter whether by communities or directly by local_pref), then internal routers would prefer the valid route, while if there is no marking they can switch to the invalid. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi! On Mon, 30 May 2022 at 15:35, Ondrej Zajicek <santiago@crfreenet.org> wrote:
If one border router receives invalid route, but due to RPKI issues mark it as 'not-found', while some other border router receives a valid route and mark it as 'valid' (does not matter whether by communities or directly by local_pref), then internal routers would prefer the valid route, while if there is no marking they can switch to the invalid.
You are of course right that the presence of a marker can be used to facilitate route decision making, this generally holds true. However I maintain that the most robust design pattern is to fix “the RPKI issue”, rather than uppref valid routes in the other half of the network. (Or accept the fact that one’s routers might select RPKI-invalid routes as best path because a router isn’t aware that the path is RPKI-invalid.) Modifying the local_pref of routes based on the RPKI validation state introduces issues such as increased potential for BGP churn, but also makes deploying RPKI in medium/large sized networks significantly harder. Whether a temporary “RPKI issue” (crashing validator) causes an inconsistent state in the overall network, or it being a slow moving multi-month RPKI-ROV deployment where POPs are converted one-by-one, to me it seems there isn’t really a material difference between the two conditions. Modifying BGP path attributes based on the validation state doesn’t seem the best current practise. It’s too bad there is a ton of documentation out there (outside the BIRD project) that says things like “instead of rejecting, you can downpref invalid routes!” (which from a security point of view because of LPM is futile). Kind regards, Job
(Sorry for the top post, my ipad seems to hate inline-replies) For my own point of view, we’re currently accepting all routes, even invalid. We’re using a BGP community so that when we sync things back to our central collector (which is just for research, like a looking glass) so we can send a report that says “at this site we got NN routes, YY invalid”. The community is not used in any way to make any decisions (on the fly decisions, I mean), nor is it passed on to any neighbors that route anything (only the collector). But my question about the user-defined attribute was that I’d like to be able to do more drill-down on the node itself. I’m seeing evidence where some of our peers claim to be rejecting RPKI invalid, but seem to be passing them on to us. -Dan Sent from my iPad
On May 30, 2022, at 10:20, Job Snijders <job@fastly.com> wrote:
Hi!
On Mon, 30 May 2022 at 15:35, Ondrej Zajicek <santiago@crfreenet.org> wrote: If one border router receives invalid route, but due to RPKI issues mark it as 'not-found', while some other border router receives a valid route and mark it as 'valid' (does not matter whether by communities or directly by local_pref), then internal routers would prefer the valid route, while if there is no marking they can switch to the invalid.
You are of course right that the presence of a marker can be used to facilitate route decision making, this generally holds true. However I maintain that the most robust design pattern is to fix “the RPKI issue”, rather than uppref valid routes in the other half of the network. (Or accept the fact that one’s routers might select RPKI-invalid routes as best path because a router isn’t aware that the path is RPKI-invalid.)
Modifying the local_pref of routes based on the RPKI validation state introduces issues such as increased potential for BGP churn, but also makes deploying RPKI in medium/large sized networks significantly harder.
Whether a temporary “RPKI issue” (crashing validator) causes an inconsistent state in the overall network, or it being a slow moving multi-month RPKI-ROV deployment where POPs are converted one-by-one, to me it seems there isn’t really a material difference between the two conditions.
Modifying BGP path attributes based on the validation state doesn’t seem the best current practise. It’s too bad there is a ton of documentation out there (outside the BIRD project) that says things like “instead of rejecting, you can downpref invalid routes!” (which from a security point of view because of LPM is futile).
Kind regards,
Job
Hi Dan, On Mon, 30 May 2022 at 17:00, Dan Mahoney <danm@prime.gushi.org> wrote:
For my own point of view, we’re currently accepting all routes, even invalid.
We’re using a BGP community so that when we sync things back to our central collector (which is just for research, like a looking glass) so we can send a report that says “at this site we got NN routes, YY invalid”.
The community is not used in any way to make any decisions (on the fly decisions, I mean), nor is it passed on to any neighbors that route anything (only the collector).
That’s a decent approach, setting it up like you describe reduces the “BGP churn blast radius” merely to your collector instance. But my question about the user-defined attribute was that I’d like to be
able to do more drill-down on the node itself. I’m seeing evidence where some of our peers claim to be rejecting RPKI invalid, but seem to be passing them on to us.
Something to consider, in any sufficiently large-sized network, the likeliness of them propagating a (low) number of RPKI-invalid routes is high. More details about how that could happen are here: https://mailman.nanog.org/pipermail/nanog/2021-April/213346.html Kind regards, Job
Hello! On 5/30/22 2:04 PM, Douglas Fischer wrote:
That made me curious...
"Note: REALLY DONT store the validation state inside a bgp_community or bgp_large_community or bgp_ext_community variables. It can cause CPU & memory overload resulting in convergence performance issues."
Why that ( CPU & memory overload ) would happen? Why is that different from a lookup against a Prefix List?
Well, every operation on bgp_*community is implemented as allocating a new memory for the new value and copying it. Therefore its complexity is linear with the size of the community list which is something you want to avoid unless you really want to export this community to your peers. We may fix this in future, yet now it is not of any priority. For everything you're storing locally, the custom attributes should be used, they are faster and we may also add some more fancy features to them without tampering on BGP itself. Maria
participants (6)
-
Dan Mahoney -
Dan Mahoney (Gushi) -
Douglas Fischer -
Job Snijders -
Maria Matejka -
Ondrej Zajicek