[PATCH] More multipath support for OSPF

Peter Christensen pch at ordbogen.com
Thu Feb 6 21:47:03 CET 2014


On 02/06/2014 04:14 PM, Ondrej Zajicek wrote:
> On Thu, Feb 06, 2014 at 02:17:12PM +0100, Peter Christensen wrote:
>> Hi,
> Hello
>
>> I noticed that the multipath support in OSPF seems to be fairly limited.
>> Essentially I was only able to make it do multipath if I had two
>> interfaces connecting to the same router.
>> At my company, we need true multipath between multiple routers using a
>> single interface.
>> (If I needed the other, I could use LACP)
> Not if such multipath spans multiple routers (e.g. a network consists of
> several routers connected by ptp links to a circle.
True. I was just considering the simple case with two routers with two 
interfaces each connecting to a switch in between. Here, LACP would work 
just fine.
>
> Also note that even if you have just one interface, you still get ECMP
> if there are several paths (through different neighbor routers) to one
> router few hops away.
Apparently I didn't. I essentially tries to make my routers balance 
traffic across multiple load-balancers running OSPF with BIRD. Their 
setup looks something like this (simplified):

protocol ospf {
     import none;
     export none;
     area 1 {
         interface "eth0";
         interface "lo";
     };
}

The loopback interface contains a number of anycast addresses which 
appears as stubnets in OSPF. The routers see the stubnet on both 
load-balancer, but only pick one (seemingly) random load-balancer when 
inserting into the routing table.

If both the router and one of the load-balancers participated in the 
area on two interfaces, I got a multipath route entry. I traced the flow 
and found that stubnets never visited the current multipath code.
>
>> I am aware of the implications the default multipath implementation in
>> Linux which operates on a per-packet basis, which is why we've patched
>> our kernels to do it per-flow instead.
> Really? AFAIK default Linux implementation is per-flow, not per-packet,
> unless this was changed recently.
The IPv4 multipath code in the kernel actually picks a pseudo-random 
route in a round-robin fashion. The route cache would however ensure 
that the flow stayed on a particular path for a while if the route was 
used continuously. In Linux 3.6 the route cache was removed from the 
kernel (apparently the route cache behaved badly under heavy load), 
effectively turning the multipath code from per-flow to per-packet. The 
IPv6 multipath code has always used a hash-based modulo-N algorithm 
which ensured consistent flow-based multipath. So we basically added an 
option in the kernel allowing for hash-based modulo-N based multipath in 
IPv4 (as an added bonus, the round-robin code required a spinlock, while 
the hash-based code is lock-free). Unfortunately our implementation 
disregard multipath weights, so I haven't bothered sending it to any 
kernel mailing list. By recommendation of RFC 2992 (Analysis of an 
Equal-Cost Multi-Path Algorithm) I'll probably change our hash modulo-N 
algorithm to a hash-threshold algorithm, which have better behavior in 
case of gateways being added or removed to the multipath.
>
>> Anyway, I seemed to have managed to make multipath work as expected - at
>> least in our setup. (Patch attached)
> Well, what is expected is the question. BIRD currently do multipath
> on idea that multiple paths through OSPF network topology to one
> destination in one area are merged, but two same routes originated by
> two different routers are considered different destinations (which makes
> perfect sense for propagated default gateways or anycast destinations).
The way I interpret the OSPFv2 spec, a destination is simply an IP 
address prefix. There may be several routes to a particular destination 
through a lot of routers, but if multiple routes to that destination 
exist whcih seems identical in quality (cost etc.), those routes are 
eligible for multipath - even though those destinations are default 
gateways or anycast destinations (anycast destination are after all 
indistinguishable from ordinary destinations). So at least what I expect 
is that /any/ seemingly equal route to a given network should be merged 
into a multipath route if ecmp is enabled.
RFC 4786 (Operation of Anycast Services) talks about using ECMP with 
anycast services, obviously mentioning that per-packet load-balancing 
can be problematic with anycast, and that hash-based ECMP is preferred. 
In other words, combining hash-based multipath with anycast may often be 
preferable, and the OSPF algorithm ought to ensure that all active 
routes to the anycast destination are of equal best cost.
>
> You patch merges such routes from different routers, but still keeps
> routes from different area. Few months ago, Volodymyr Samodid
> commented that ECMP in OSPF should merge paths from multiple areas.
Really? From RFC 2328 (OSPF Version 2) section 16.8 (page 178):

"Each one of the multiple routes will be of the same type
(intra-area, inter-area, type 1 external or type 2 external),
cost, and will have the same associated area.  However, each
route may specify a separate next hop and Advertising router."

Arent't they saying that each route in the multipath entry must share the same associated area?


>
> So it seems that this should be at least configurable (like 'ecmp merge
> internal <bool>', 'ecmp merge external <bool>', 'ecmp merge areas <bool>').
> The question is how much detailed such configuration shouldbe. For example,
> it may be useful to merge external routes with the same route tag, but
> not merge external routes with different ones. And what about merging
> internal and external routes together, is this useful?
>
> Any thoughts on this issue?
At least from the RFC 2328 point of view, it apparently doesn't make 
sense to merge the routes across different types of routes. But I guess 
that boils down to the fact that they usually have different costs.
>
>
>> Essentially, I've hooked my multipath code into ri_install_ext() and
>> ri_install_net(), where I add the equal routes if the routes share the
>> same type, metrics and OSPF area.
>> I realize that my add_nexthops() is /very/ similar to merge_nexthops()
>> in functionality, but it seemed that the top_hash_entry() could be null,
>> so I wrote a new method which did not rely on that - at the cost of more
>> calls to copy_nexthop(), I guess.
>>
>> Any thoughts?
> The implementation looks clear and simple, i will look at it thoroughly
> in a few days. On the first look i see that the patch forgot to zero
> orta->rid and perhaps orta->tag if merged routes have it different.
>
Yeah, I guess clearing rid makes sense since the route is really from 
different routers. As for the tag, I'm not sure what the expected 
behavior is, since it is out of the scope of the OSPFv2 spec. Maybe that 
is cause enough to make it tunable whether routes with different tags 
can be merged.

Another thing I've personally noticed, is that I should probably also 
check ORTA_NSSA, ORTA_PROP and ORTA_PREF when verifying equal cost 
routes in ri_install_ext. ri_better_ext is after all taking them into 
consideration.




More information about the Bird-users mailing list