[PATCH] bgp deterministic med
Hello list! This patch implements RFC 4271 MED comparison. This patch may introduce significant performance penalty (unfortunately I haven't got any real numbers at the moment). However, this comparison is absolutely necessary at least for RR setups. rte_better cost has changed from O(1) to O(n) (should not affect performance much) rte_recalculate can execute 2-3 times slower in worst case (this can consume more resources when preferred BGP neighbor goes down) Deterministic med can be made as optional global BGP config switch (which will be done in next patch version). Keeping this switch off with patched bird should not introduce much performance penalty. Please consider this patch as beta. Comments/reviews are welcome.
On Mon, Nov 21, 2011 at 06:12:19PM +0400, Alexander V. Chernikov wrote:
Hello list!
This patch implements RFC 4271 MED comparison.
This patch may introduce significant performance penalty (unfortunately I haven't got any real numbers at the moment).
However, this comparison is absolutely necessary at least for RR setups.
rte_better cost has changed from O(1) to O(n) (should not affect performance much) rte_recalculate can execute 2-3 times slower in worst case (this can consume more resources when preferred BGP neighbor goes down)
This is an issue we (developers) discussed privately about half a year ago and came to conclusion that strict RFC 4271 behavior - deterministic MED comparison - is not really meaningful, hard to implement in current BIRD design and probably not worth the effort. As the deterministic MED behavior is not even default on Ciscos (as far as i read), there are probably not many problems with compatibility. Perhaps the better way to solve problems with nondeterministic selection is to enable 'med metric' (i.e. always compare med) and tweak MEDs on AS boundaries to have consistent values (or just reset all MEDs to 0). But as you already wrote the patch i could look at it. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 23.11.2011 20:07, Ondrej Zajicek wrote:
On Mon, Nov 21, 2011 at 06:12:19PM +0400, Alexander V. Chernikov wrote:
Hello list!
This patch implements RFC 4271 MED comparison.
This patch may introduce significant performance penalty (unfortunately I haven't got any real numbers at the moment).
However, this comparison is absolutely necessary at least for RR setups.
rte_better cost has changed from O(1) to O(n) (should not affect performance much) rte_recalculate can execute 2-3 times slower in worst case (this can consume more resources when preferred BGP neighbor goes down)
This is an issue we (developers) discussed privately about half a year ago and came to conclusion that strict RFC 4271 behavior - deterministic MED comparison - is not really meaningful, hard to implement in current BIRD design and probably not worth the effort. As the deterministic MED behavior is not even default on Ciscos (as far as i read), there are probably not many problems with compatibility. Yes, it is not. Small networks does not require this behavior.
Perhaps the better way to solve problems with nondeterministic selection is to enable 'med metric' (i.e. always compare med) and tweak MEDs on AS boundaries to have consistent values (or just reset all MEDs to 0). Sometimes 'non-deterministic' is not enough.
Imagine the following: AS1 is the customer of AS2 and AS3 AS4 peers with both AS2 and AS3 ------- | AS1 | ------- / \ -------- -------- | AS2 | | AS3 | -------- -------- | | | | --------------------------------- / \ | AS 4 | \---------------------------------/ AS2 has 2 links with AS4 (MED 100 and 200) in one city/country AS3 has 2 links with AS4 (MED 300 and 400) in the other city All this 4 links comes via iBGP to on of route reflectors in AS3 city. In the normal situation (deterministic med on) best paths from AS2 and AS3 to AS1 will be compared by IGP metric which is reasonable (link capacity/delay can be different so the best _local_ exit is selected). In case of always-compare-med traffic flow became suboptimal and fixing this (due to large number of peers) will significantly increase number of such hacks in route-maps on border routers. I do not insist to make RFC4271 behavior the default one, but the ability to turn this on is (IMHO) mandatory. It is supported for example by Quagga and Juniper. The latter one uses deterministic MED by default.
But as you already wrote the patch i could look at it.
It will be great. This version is rather hackish. BTW, do you have some kind of IM? or IRC ? Or you prefer ML discussions? -- WBR, Alexander
On Wed, Nov 23, 2011 at 09:13:17PM +0400, Alexander V. Chernikov wrote:
In case of always-compare-med traffic flow became suboptimal and fixing this (due to large number of peers) will significantly increase number of such hacks in route-maps on border routers.
Yes, always-compare-med without normalization (or just zeroing) of MEDs on border routers does not make sense.
I do not insist to make RFC4271 behavior the default one, but the ability to turn this on is (IMHO) mandatory.
It is supported for example by Quagga and Juniper. The latter one uses deterministic MED by default.
But as you already wrote the patch i could look at it.
It will be great. This version is rather hackish.
The patch IMHO does not work. Suppose this example: Route A - neigbor ASN 1, MED 100, IGP 30 Route B - neigbor ASN 2, MED irrelevant, IGP 20 Route C - neigbor ASN 1, MED 200, IGP 10 (local pref and path lengths are same) When just B and C are in the table, C is chosen (by IGP cost). When A appears later, B should be chosen according to RFC 4271 (C is removed from consideration by less prefered MED, ties are broken by IGP cost), your patch chooses A (because rte_better says A is better than C, so it is handled as the first case in rte_recalculate()). BTW, this behavior (that the new route is not selected, but changes ordering between two already present routes) is what IMHO made the RFC 4271 behavior crazy.
BTW, do you have some kind of IM? or IRC ? Or you prefer ML discussions?
I prefer ML discussions (and don't use IM). -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, Nov 23, 2011 at 09:13:17PM +0400, Alexander V. Chernikov wrote:
But as you already wrote the patch i could look at it.
It will be great. This version is rather hackish.
I found how this feature could be implemented in a much simpler way - if we just add to a route some 'suppress' flag (which causes the route to be skipped in the best route selection) and some rte_suppress protocol callback that is called before the best route selection that allows the protocol to suppress routes from other instances of the protocol. Because the deterministic-med best route selection can be considered as two-level linear ordering (one within the group of routes with the same neighbor ASN, and one with a slightly different ordering on the best routes from groups) we could move the lower level to the suppress protocol callback (when all routes besides the best in group would be marked as suppressed) while the standard rte_recalculate would process the unmarked best routes from groups. This approach makes almost no changes to rte_recalculate() or bgp_rte_better(), and as with a route update only one or two best-in-group routes can change, we virtually cache the other best-in-group routes so the best path selection would be probably faster than with rte_best approach. It will also work consistently without global deterministic-med option (which is problematic in BIRD approach). I will implement it soon. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek wrote:
On Wed, Nov 23, 2011 at 09:13:17PM +0400, Alexander V. Chernikov wrote:
But as you already wrote the patch i could look at it.
It will be great. This version is rather hackish.
I found how this feature could be implemented in a much simpler way - if we just add to a route some 'suppress' flag (which causes the route to be skipped in the best route selection) and some rte_suppress protocol callback that is called before the best route selection that allows the protocol to suppress routes from other instances of the protocol.
Because the deterministic-med best route selection can be considered as two-level linear ordering (one within the group of routes with the same neighbor ASN, and one with a slightly different ordering on the best routes from groups) we could move the lower level to the suppress protocol callback (when all routes besides the best in group would be marked as suppressed) while the standard rte_recalculate would process the unmarked best routes from groups. Quagga uses the same approach pre-marking the best routes in a group. I didn't implemented in in first version because I was afraid about partial route leaking via pipe protocol without appropriate flag filtering.
This approach makes almost no changes to rte_recalculate() or bgp_rte_better(), and as with a route update only one or two best-in-group routes can change, we virtually cache the other best-in-group routes so the best path selection would be probably faster than with rte_best approach. It will also work consistently without global deterministic-med option (which is problematic in BIRD approach). I will implement it soon.
Thanks!
participants (3)
-
Alexander V. Chernikov -
Alexander V. Chernikov -
Ondrej Zajicek