ECMP on 2 x 1ge Linux hosts
I'm trying out OSPF ECMP on BIRD (with ECMP set and enabled on kernel) and on hosts A and host B are configured as below and connected via a switch, the dummy0 interface is used as the loopback address for two physical nics. host A eth1 - /30 - OSPF eth0 - /30 - OSPF dummy0 - /32 host B eth1 - /30 - OSPF eth0 - /30 - OSPF dummy0 - /32
On Mon, May 06, 2013 at 06:50:10PM -0700, Bao Nguyen wrote:
From host A I do "iperf -c x.x.x.x -P 4" to host B's dummy0 interface, host A iperf seemed to pick a single eth0 or eth1 but not both interfaces and use it to send out traffic on all 4 difference processes. Ideally how would one allow a single application (in this case iperf) with multiple threads/processes to send 2Gbit worth of traffic? Ideally to a logical interface and it's hash automatically to two difference eth0 and eth1?
AFAIK Linux kernel implements ECMP in a way that for each route cache entry just one path is used (and fixed in route cache). I heard that there are patches for different behavior. I don't know how this behavior differs on newer kernels without route cache. Without the current behavior, an application could use different addresses or ports for each thread to use multiple threads, i guess. Or use something completely different, like link-level interface bonding.
I've looked at setting (krt_prefsrc) to source the address as the dummy0 interface on each host. Would that be the answer?
Probably not (although it is probably useful anyway). -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej, Thanks for pointing toward the directions of route cache. I think you might be right about the cache, even though there are multiple cache entries for 10.25.14.20 (the server) in this case, there's still only one path selected for each destination i.e. instead of multiple path for each destination... 10.25.14.20 via 10.25.206.69 dev eth1 src 10.25.206.70 cache used 3 age 38sec ipid 0xfe2e rtt 40ms rttvar 10ms ssthresh 4081 cwnd 7935 10.25.14.20 via 10.25.205.69 dev eth0 src 10.25.205.70 cache used 13 age 4sec ipid 0xfe2e rtt 40ms rttvar 10ms ssthresh 4081 cwnd 7935 I think the problem is more with the way the client selected the interface to use then the route cache in this case. For example, you see that 10.25.205.70 interface is selected here instead of dividing the threads into multiple interfaces between 10.25.205.70 and 10.25.206.70 in which this host have two physical connections. I'm wondering if there is a way to ties say the dummy0 interface so that if you point traffic toward it, it will hash it across two physical links... ------------------------------------------------------------ Client connecting to 10.25.14.20, TCP port 5001 TCP window size: 64.0 KByte (default) ------------------------------------------------------------ [ 6] local 10.25.205.70 port 52606 connected with 10.25.14.20 port 5001 [ 4] local 10.25.205.70 port 52603 connected with 10.25.14.20 port 5001 [ 5] local 10.25.205.70 port 52605 connected with 10.25.14.20 port 5001 [ 3] local 10.25.205.70 port 52604 connected with 10.25.14.20 port 5001 These nodes are not setup for transit traffic but I'm wondering how the through transit traffic would look and how the kernel would decide how to balance the traffic or it would pick one instead of two.. -bn 0216331C On Tue, May 7, 2013 at 1:13 AM, Ondrej Zajicek <santiago@crfreenet.org>wrote:
On Mon, May 06, 2013 at 06:50:10PM -0700, Bao Nguyen wrote:
From host A I do "iperf -c x.x.x.x -P 4" to host B's dummy0 interface, host A iperf seemed to pick a single eth0 or eth1 but not both interfaces and use it to send out traffic on all 4 difference processes. Ideally how would one allow a single application (in this case iperf) with multiple threads/processes to send 2Gbit worth of traffic? Ideally to a logical interface and it's hash automatically to two difference eth0 and eth1?
AFAIK Linux kernel implements ECMP in a way that for each route cache entry just one path is used (and fixed in route cache). I heard that there are patches for different behavior. I don't know how this behavior differs on newer kernels without route cache. Without the current behavior, an application could use different addresses or ports for each thread to use multiple threads, i guess. Or use something completely different, like link-level interface bonding.
I've looked at setting (krt_prefsrc) to source the address as the dummy0 interface on each host. Would that be the answer?
Probably not (although it is probably useful anyway).
-- Elen sila lumenn' omentielvo
Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAlGIt6QACgkQw1GB2RHercP+UACfZhlP0L9VbQ9pU281fmRGIj0G a3IAn1yeE9uPcY/NUbM9lm0bepkVJKNr =8shF -----END PGP SIGNATURE-----
participants (2)
-
Bao Nguyen -
Ondrej Zajicek