Hi, I've revised the old patch and now propose this new version (attached). First of all, I fixed a bug in add_nexthops() which could cause it to forget the existing nexthops if they appeared later in sorting order than then newly added nexthops. Secondly, if the resulting list of nexthops contains more than one nexthop, it now handles the router ID and tag. The only place router ID is used is apparently in NSSA handling which prefers larger router IDs. So to ensure that a multipath route doesn't get confused with another route with a lower router ID, we keep the highest router ID when adding multipaths. As for tags, it's difficult to tell what should be done, but in this case I've decided to clear the tag if the two tag differs, ensuring that we don't lie about the routes (except for cases where admins uses tag=0 to indicate something weird). ri_better(), ri_better_ext(), ri_better_asbr(), and my old ri_equal_cost() have been replaced with ri_compare(), ri_compare_ext(), and ri_compare_asbr() which returns <0, 1, or >0 depending on relative priority. The comparison code is structured syntactically different, but the end result is less code (and on x86 and amd64, less machine code). I just had to handle unsigned 32 bit values differently (rid and areaid). Paths are _ONLY_ considered equal if the area ids are also equal. In other cases, the highest area id is used. ri_compare_asbr() already needed that priority, and for the other scenarios, the RFC does not mandate priority. According to the IETF OSPF mailing list, different implementations prioritize area ids differently. I decided to do potential multipath in /all/ ri_install functions based on the RFC 2328 which states that all routes can be multipath. Besides, multiple threads in the IETF OSPF mailinglist seems to confirm that any route can be multipath as long as prefix, path type, cost, and area is the same. RFC 3101 (NSSA) doesn't mention multipath at all, which initially put me in a dilemma, since it uses router id to break ties. So my solution was to only do that comparison if both routes have the ORTA_NSSA option. I also took the liberty to add const to arguments where it made sense. Best regards Peter Christensen On 02/10/2014 01:00 PM, Ondrej Zajicek wrote:
On Thu, Feb 06, 2014 at 09:47:03PM +0100, Peter Christensen wrote:
Anyway, I seemed to have managed to make multipath work as expected - at least in our setup. (Patch attached) Well, what is expected is the question. BIRD currently do multipath on idea that multiple paths through OSPF network topology to one destination in one area are merged, but two same routes originated by two different routers are considered different destinations (which makes perfect sense for propagated default gateways or anycast destinations). The way I interpret the OSPFv2 spec, a destination is simply an IP address prefix. There may be several routes to a particular destination through a lot of routers, but if multiple routes to that destination exist whcih seems identical in quality (cost etc.), those routes are eligible for multipath - even though those destinations are default gateways or anycast destinations (anycast destination are after all indistinguishable from ordinary destinations). So at least what I expect is that /any/ seemingly equal route to a given network should be merged into a multipath route if ecmp is enabled. I checked the OSPFv2 spec and you are right, your interpretation seems to be more in line with the meaning of the OSPFv2 spec.
You patch merges such routes from different routers, but still keeps routes from different area. Few months ago, Volodymyr Samodid commented that ECMP in OSPF should merge paths from multiple areas. Really? From RFC 2328 (OSPF Version 2) section 16.8 (page 178): ... Arent't they saying that each route in the multipath entry must share the same associated area? Sorry, i misremembered that comment, it was related to merging of external paths, not merging paths from different areas.
So it seems that this should be at least configurable (like 'ecmp merge internal <bool>', 'ecmp merge external <bool>', 'ecmp merge areas <bool>'). The question is how much detailed such configuration shouldbe. For example, it may be useful to merge external routes with the same route tag, but not merge external routes with different ones. And what about merging internal and external routes together, is this useful?
Any thoughts on this issue? At least from the RFC 2328 point of view, it apparently doesn't make sense to merge the routes across different types of routes. But I guess that boils down to the fact that they usually have different costs. You are right, merging routes from different areas or of different types does not really make sense.
Essentially, I've hooked my multipath code into ri_install_ext() and ri_install_net(), where I add the equal routes if the routes share the same type, metrics and OSPF area. I realize that my add_nexthops() is /very/ similar to merge_nexthops() in functionality, but it seemed that the top_hash_entry() could be null, so I wrote a new method which did not rely on that - at the cost of more calls to copy_nexthop(), I guess.
Any thoughts? The implementation looks clear and simple, i will look at it thoroughly in a few days. On the first look i see that the patch forgot to zero orta->rid and perhaps orta->tag if merged routes have it different.
Yeah, I guess clearing rid makes sense since the route is really from different routers. As for the tag, I'm not sure what the expected behavior is, since it is out of the scope of the OSPFv2 spec. Maybe that is cause enough to make it tunable whether routes with different tags can be merged.
Another thing I've personally noticed, is that I should probably also check ORTA_NSSA, ORTA_PROP and ORTA_PREF when verifying equal cost routes in ri_install_ext. ri_better_ext is after all taking them into consideration. Note that the last comparison in ri_better_ext() is based on rid, which means that you would have to merge ext routes even in some cases when ri_better() returns true. Or perhaps change ri_better_ext() to ri_compare_ext() returning -1, 0, 1 to avoid complicated ri_equal_cost().
Also i am not sure if we would not have to merge next hops in ri_install_rt() for the purpose of summary-rt LSA processing, i would have to look deeper at this.