[RFC,PATCH 1/2] BGP: Implement iebgp_peer mode.
Implement an 'internal eBGP' peer mode, where the remote peer uses a different AS number than we do, as if it were an eBGP peer, but we treat the peer as if were an AS-internal peer. This enables implementing a network setup according to the model documented in RFC7938. This makes two changes to the BGP route propagation logic: * When we are propagating a route to or from an internal eBGP peer, we will avoid resetting the MED attribute. * When comparing BGP-learned routes, we will consider routes learned from an 'internal eBGP' peer as iBGP routes as far as the RFC 4271 9.1.2.2. d) check (Prefer external peers) is concerned. --- proto/bgp/attrs.c | 22 ++++++++++++++++------ proto/bgp/bgp.h | 1 + proto/bgp/config.Y | 4 +++- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/proto/bgp/attrs.c b/proto/bgp/attrs.c index b9e2490d..478c805c 100644 --- a/proto/bgp/attrs.c +++ b/proto/bgp/attrs.c @@ -1100,15 +1100,21 @@ bgp_update_attrs(struct bgp_proto *p, rte *e, ea_list **attrs, struct linpool *p if (!p->is_internal && !p->rs_client) { + struct bgp_proto *new_bgp = (e->attrs->src->proto->proto == &proto_bgp) ? + (struct bgp_proto *) e->attrs->src->proto : NULL; + bgp_path_prepend(e, attrs, pool, p->local_as); /* The MULTI_EXIT_DISC attribute received from a neighboring AS MUST NOT be * propagated to other neighboring ASes. * Perhaps it would be better to undefine it. */ - a = ea_find(e->attrs->eattrs, EA_CODE(EAP_BGP, BA_MULTI_EXIT_DISC)); - if (a) - bgp_attach_attr(attrs, pool, BA_MULTI_EXIT_DISC, 0); + if (!p->cf->iebgp_peer && (new_bgp == NULL || !new_bgp->cf->iebgp_peer)) + { + a = ea_find(e->attrs->eattrs, EA_CODE(EAP_BGP, BA_MULTI_EXIT_DISC)); + if (a) + bgp_attach_attr(attrs, pool, BA_MULTI_EXIT_DISC, 0); + } } /* iBGP -> keep next_hop, eBGP multi-hop -> use source_addr, @@ -1317,9 +1323,11 @@ bgp_rte_better(rte *new, rte *old) } /* RFC 4271 9.1.2.2. d) Prefer external peers */ - if (new_bgp->is_internal > old_bgp->is_internal) + n = !!(new_bgp->is_internal || new_bgp->cf->iebgp_peer); + o = !!(old_bgp->is_internal || old_bgp->cf->iebgp_peer); + if (n > o) return 0; - if (new_bgp->is_internal < old_bgp->is_internal) + if (n < o) return 1; /* RFC 4271 9.1.2.2. e) Compare IGP metrics */ @@ -1424,7 +1432,9 @@ bgp_rte_mergable(rte *pri, rte *sec) } /* RFC 4271 9.1.2.2. d) Prefer external peers */ - if (pri_bgp->is_internal != sec_bgp->is_internal) + p = !!(pri_bgp->is_internal || pri_bgp->cf->iebgp_peer); + s = !!(sec_bgp->is_internal || sec_bgp->cf->iebgp_peer); + if (p != s) return 0; /* RFC 4271 9.1.2.2. e) Compare IGP metrics */ diff --git a/proto/bgp/bgp.h b/proto/bgp/bgp.h index e47a0eb1..8652fae8 100644 --- a/proto/bgp/bgp.h +++ b/proto/bgp/bgp.h @@ -51,6 +51,7 @@ struct bgp_config { int add_path; /* Use ADD-PATH extension [RFC7911] */ int allow_local_as; /* Allow that number of local ASNs in incoming AS_PATHs */ int allow_local_pref; /* Allow LOCAL_PREF in EBGP sessions */ + int iebgp_peer; /* Peer has different AS but should be treated as internal peer */ int gr_mode; /* Graceful restart mode (BGP_GR_*) */ int setkey; /* Set MD5 password to system SA/SP database */ unsigned gr_time; /* Graceful restart timeout */ diff --git a/proto/bgp/config.Y b/proto/bgp/config.Y index 55c602f1..f01a6fb6 100644 --- a/proto/bgp/config.Y +++ b/proto/bgp/config.Y @@ -27,7 +27,8 @@ CF_KEYWORDS(BGP, LOCAL, NEIGHBOR, AS, HOLD, TIME, CONNECT, RETRY, INTERPRET, COMMUNITIES, BGP_ORIGINATOR_ID, BGP_CLUSTER_LIST, IGP, TABLE, GATEWAY, DIRECT, RECURSIVE, MED, TTL, SECURITY, DETERMINISTIC, SECONDARY, ALLOW, BFD, ADD, PATHS, RX, TX, GRACEFUL, RESTART, AWARE, - CHECK, LINK, PORT, EXTENDED, MESSAGES, SETKEY, BGP_LARGE_COMMUNITY) + CHECK, LINK, PORT, EXTENDED, MESSAGES, SETKEY, BGP_LARGE_COMMUNITY, + IEBGP_PEER) CF_GRAMMAR @@ -129,6 +130,7 @@ bgp_proto: | bgp_proto ALLOW BGP_LOCAL_PREF bool ';' { BGP_CFG->allow_local_pref = $4; } | bgp_proto ALLOW LOCAL AS ';' { BGP_CFG->allow_local_as = -1; } | bgp_proto ALLOW LOCAL AS expr ';' { BGP_CFG->allow_local_as = $5; } + | bgp_proto IEBGP_PEER bool ';' { BGP_CFG->iebgp_peer = $3; } | bgp_proto GRACEFUL RESTART bool ';' { BGP_CFG->gr_mode = $4; } | bgp_proto GRACEFUL RESTART AWARE ';' { BGP_CFG->gr_mode = BGP_GR_AWARE; } | bgp_proto GRACEFUL RESTART TIME expr ';' { BGP_CFG->gr_time = $5; } -- 2.13.4
When skip_private_as_path_prefix is enabled for a BGP peer, leading private AS numbers will be skipped when computing the length of AS paths learned from this peer for the purpose of comparing path lengths of different BGP-learned routes. --- nest/a-path.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ nest/attrs.h | 1 + proto/bgp/attrs.c | 8 +++--- proto/bgp/bgp.h | 1 + proto/bgp/config.Y | 3 ++- 5 files changed, 82 insertions(+), 5 deletions(-) diff --git a/nest/a-path.c b/nest/a-path.c index b453f702..c2d8bdf8 100644 --- a/nest/a-path.c +++ b/nest/a-path.c @@ -193,6 +193,80 @@ as_path_getlen_int(struct adata *path, int bs) } int +as_is_private(u32 as) +{ + if (as >= 64512 && as <= 65534) + return 1; + if (as >= 4200000000 && as <= 4294967294) + return 1; + return 0; +} + +int +as_set_is_private(u32 *as, int len) +{ + int i; + + for (i = 0; i < len; i++) + if (!as_is_private(get_u32(as + i))) + return 0; + + return 1; +} + +int +as_sequence_private_prefix_length(u32 *as, int len) +{ + int i; + + for (i = 0; i < len; i++) + if (!as_is_private(get_u32(as + i))) + break; + + return i; +} + +int +as_path_getlen_ext(struct adata *path, int skip_private) +{ + int res = 0; + u8 *p = path->data; + u8 *q = p+path->length; + int len; + int i; + + while (p<q) + { + switch (*p++) + { + case AS_PATH_SET: + len = *p++; + if (skip_private && !as_set_is_private((u32 *)p, len)) + skip_private = 0; + if (!skip_private) + res++; + p += BS * len; + break; + case AS_PATH_SEQUENCE: + len = *p++; + i = 0; + if (skip_private) + { + i = as_sequence_private_prefix_length((u32 *)p, len); + if (i < len) + skip_private = 0; + } + res += len - i; + p += BS * len; + break; + default: + bug("as_path_getlen_ext: Invalid path segment"); + } + } + return res; +} + +int as_path_get_last(struct adata *path, u32 *orig_as) { int found = 0; diff --git a/nest/attrs.h b/nest/attrs.h index a34e64d3..d0808cd6 100644 --- a/nest/attrs.h +++ b/nest/attrs.h @@ -33,6 +33,7 @@ int as_path_convert_to_new(struct adata *path, byte *dst, int req_as); void as_path_format(struct adata *path, byte *buf, uint size); int as_path_getlen(struct adata *path); int as_path_getlen_int(struct adata *path, int bs); +int as_path_getlen_ext(struct adata *path, int skip_private); int as_path_get_first(struct adata *path, u32 *orig_as); int as_path_get_last(struct adata *path, u32 *last_as); u32 as_path_get_last_nonaggregated(struct adata *path); diff --git a/proto/bgp/attrs.c b/proto/bgp/attrs.c index 478c805c..ec6db1e5 100644 --- a/proto/bgp/attrs.c +++ b/proto/bgp/attrs.c @@ -1280,8 +1280,8 @@ bgp_rte_better(rte *new, rte *old) { x = ea_find(new->attrs->eattrs, EA_CODE(EAP_BGP, BA_AS_PATH)); y = ea_find(old->attrs->eattrs, EA_CODE(EAP_BGP, BA_AS_PATH)); - n = x ? as_path_getlen(x->u.ptr) : AS_PATH_MAXLEN; - o = y ? as_path_getlen(y->u.ptr) : AS_PATH_MAXLEN; + n = x ? as_path_getlen_ext(x->u.ptr, new_bgp->cf->skip_private_as_path_prefix) : AS_PATH_MAXLEN; + o = y ? as_path_getlen_ext(y->u.ptr, old_bgp->cf->skip_private_as_path_prefix) : AS_PATH_MAXLEN; if (n < o) return 1; if (n > o) @@ -1401,8 +1401,8 @@ bgp_rte_mergable(rte *pri, rte *sec) { x = ea_find(pri->attrs->eattrs, EA_CODE(EAP_BGP, BA_AS_PATH)); y = ea_find(sec->attrs->eattrs, EA_CODE(EAP_BGP, BA_AS_PATH)); - p = x ? as_path_getlen(x->u.ptr) : AS_PATH_MAXLEN; - s = y ? as_path_getlen(y->u.ptr) : AS_PATH_MAXLEN; + p = x ? as_path_getlen_ext(x->u.ptr, pri_bgp->cf->skip_private_as_path_prefix) : AS_PATH_MAXLEN; + s = y ? as_path_getlen_ext(y->u.ptr, sec_bgp->cf->skip_private_as_path_prefix) : AS_PATH_MAXLEN; if (p != s) return 0; diff --git a/proto/bgp/bgp.h b/proto/bgp/bgp.h index 8652fae8..10fdeeda 100644 --- a/proto/bgp/bgp.h +++ b/proto/bgp/bgp.h @@ -52,6 +52,7 @@ struct bgp_config { int allow_local_as; /* Allow that number of local ASNs in incoming AS_PATHs */ int allow_local_pref; /* Allow LOCAL_PREF in EBGP sessions */ int iebgp_peer; /* Peer has different AS but should be treated as internal peer */ + int skip_private_as_path_prefix; /* Skip leading private ASes when comparing AS path lengths */ int gr_mode; /* Graceful restart mode (BGP_GR_*) */ int setkey; /* Set MD5 password to system SA/SP database */ unsigned gr_time; /* Graceful restart timeout */ diff --git a/proto/bgp/config.Y b/proto/bgp/config.Y index f01a6fb6..153747c6 100644 --- a/proto/bgp/config.Y +++ b/proto/bgp/config.Y @@ -28,7 +28,7 @@ CF_KEYWORDS(BGP, LOCAL, NEIGHBOR, AS, HOLD, TIME, CONNECT, RETRY, TABLE, GATEWAY, DIRECT, RECURSIVE, MED, TTL, SECURITY, DETERMINISTIC, SECONDARY, ALLOW, BFD, ADD, PATHS, RX, TX, GRACEFUL, RESTART, AWARE, CHECK, LINK, PORT, EXTENDED, MESSAGES, SETKEY, BGP_LARGE_COMMUNITY, - IEBGP_PEER) + IEBGP_PEER, SKIP_PRIVATE_AS_PATH_PREFIX) CF_GRAMMAR @@ -135,6 +135,7 @@ bgp_proto: | bgp_proto GRACEFUL RESTART AWARE ';' { BGP_CFG->gr_mode = BGP_GR_AWARE; } | bgp_proto GRACEFUL RESTART TIME expr ';' { BGP_CFG->gr_time = $5; } | bgp_proto IGP TABLE rtable ';' { BGP_CFG->igp_table = $4; } + | bgp_proto SKIP_PRIVATE_AS_PATH_PREFIX bool ';' { BGP_CFG->skip_private_as_path_prefix = $3; } | bgp_proto TTL SECURITY bool ';' { BGP_CFG->ttl_security = $4; } | bgp_proto CHECK LINK bool ';' { BGP_CFG->check_link = $4; } | bgp_proto BFD bool ';' { BGP_CFG->bfd = $3; cf_check_bfd($3); } -- 2.13.4
On Sat, Aug 12, 2017 at 10:22:06PM +0300, Lennert Buytenhek wrote:
Implement an 'internal eBGP' peer mode, where the remote peer uses a different AS number than we do, as if it were an eBGP peer, but we treat the peer as if were an AS-internal peer. This enables implementing a network setup according to the model documented in RFC7938. This makes two changes to the BGP route propagation logic:
* When we are propagating a route to or from an internal eBGP peer, we will avoid resetting the MED attribute.
* When comparing BGP-learned routes, we will consider routes learned from an 'internal eBGP' peer as iBGP routes as far as the RFC 4271 9.1.2.2. d) check (Prefer external peers) is concerned.
Hi I am inclined to integrate it (perhaps with some tweaks), but i wonder if it has some advantages compared to standardized solutions, namely BGP confederations and BGP route reflectors. It seems to me like a third way to do the same. Analogy with BGP confederations is obvious, BGP route reflectors are usually used in different way, but could be configured to work analogically (every router as RR, every IBGP link as mutual RR client). So why another approach? Also, it seems to me that it handles BGP_MED, but does not change behavior for BGP_NEXT_HOP nor BGP_LOCAL_PREF. Why? As BGP in 1.6.x branch and 2.0 branch diverged significantly, i am inclined to add the feature just to 2.0 branch, to avoid double work and reuse BGP confederation code that does essentially the same. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi, from an architectural point of view this patch is one of many that will help us migrating from an iBGP-only datacenter to an eBGP-only datacenter so the idea is to have a feature that is as simple as possible with almost no implications that can be turned on during the migration and then off after it. A confederation would have a similar effect but would have other side effects and would also require to change things in other places, leading to a maintenance window we are actually trying to avoid with these set of patches. I am going to go a bit more in detail in a talk at NANOG71 in a couple of weeks, but let me try to summarize it here. The patches in question are: http://bird.network.cz/pipermail/bird-users/2017-March/011084.html http://bird.network.cz/pipermail/bird-users/2017-February/010925.html http://bird.network.cz/pipermail/bird-users/2017-August/011468.html http://bird.network.cz/pipermail/bird-users/2017-August/011469.html By combining the patches above we are able to migrate a full POP from iBGP with route reflectors to an eBGP-only setup without intervention and without having to drain POPs: 1. The patch `Secondary remote AS support for protocol BGP ` let's us configure the devices ahead of time to accept OPEN messages from different AS's so we don't have to bother synch'ing changes. 2. The `Allow exchanging LOCAL_PREF with eBGP peers` patch allows us to keep doing operations as we do now regardless of if the POP is formed by eBGP or iBGP speakers. 3. And for the last two patches let's look at this scenario: /---- spine00 (AS65000) leaf00 (AS12345) \---- spine01 (AS12345) Prefix A coming from the spines: from spine00: protocol: eBGP AS PATH: 65000 $AS_PATH from spine01: protocol: iBGP AS PATH: $AS_PATH a. The patch "Implement iebgp_peer mode" helps us with the eBGP vs iBGP comparison. It doesn't try to do much else. b. Now that eBGP vs iBGP is no longer a concern for this set of peers we need to fix the AS_PATH or the one from spine01 will be shorter and, this, win the election. This is where the patch "Implement skip_private_as_path_prefix" comes in. Instead of manipulating the iBGP speaker to prepend some AS we would rather skip the leading private AS's in the eBGP speaker. The reason is that as we have many spines so you would have to synch the changes so you wouldn't send all your traffic via only one (or alternatively schedule a maintenance which we are trying to avoid). So now with all the features in place we can: 1. Ahead of time, enable all the features; local-pref over eBGP, have current and future AS configured, skip leading private AS's and compare eBGP as iBGP in the corresponding sessions. 2. During the maintenance change "local as" to the new private AS, switch by switch, causing only a small BGP session flap with no other implication. And this is how we are migrating from iBGP to eBGP without human intervention and without having to drain POPs. Once migration is complete we can remove "compare_as_ibgp" flag. In summary, I am not sure how useful some of these features could be in a permanent design, it might allow people to build `confederation-like` setups that are a bit simpler than using confederations but our goal is to have a path to migrate POPs autonomously without having to put them in maintenance mode. I understand these features are quite "temporary" and you might be tempted to dismiss them due to their nature but I think they are quite useful in production scenarios as they can simplify operations dramatically. Regards. David Barroso On Tue, 19 Sep 2017 at 15:08 Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Sat, Aug 12, 2017 at 10:22:06PM +0300, Lennert Buytenhek wrote:
Implement an 'internal eBGP' peer mode, where the remote peer uses a different AS number than we do, as if it were an eBGP peer, but we treat the peer as if were an AS-internal peer. This enables implementing a network setup according to the model documented in RFC7938. This makes two changes to the BGP route propagation logic:
* When we are propagating a route to or from an internal eBGP peer, we will avoid resetting the MED attribute.
* When comparing BGP-learned routes, we will consider routes learned from an 'internal eBGP' peer as iBGP routes as far as the RFC 4271 9.1.2.2. d) check (Prefer external peers) is concerned.
Hi
I am inclined to integrate it (perhaps with some tweaks), but i wonder if it has some advantages compared to standardized solutions, namely BGP confederations and BGP route reflectors. It seems to me like a third way to do the same. Analogy with BGP confederations is obvious, BGP route reflectors are usually used in different way, but could be configured to work analogically (every router as RR, every IBGP link as mutual RR client). So why another approach?
Also, it seems to me that it handles BGP_MED, but does not change behavior for BGP_NEXT_HOP nor BGP_LOCAL_PREF. Why?
As BGP in 1.6.x branch and 2.0 branch diverged significantly, i am inclined to add the feature just to 2.0 branch, to avoid double work and reuse BGP confederation code that does essentially the same.
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-- David Barroso linkedin <https://www.linkedin.com/in/dbarrosop> | twitter <https://twitter.com/dbarrosop> | github <https://github.com/dbarrosop>
participants (3)
-
David Barroso -
Lennert Buytenhek -
Ondrej Zajicek