Exporting multipath link routes from Linux kernel
Hello all, we do virtual hosting, and we provide routeable /32 addresses to the guests. Kernel routes on the KVM host are link routes that look like this: 1.0.0.113 dev pub020304050612 proto static They are picked up by bird and exported to the core router. When we launch multiple guests with the same address on multiple hosts, this results in an ECMP route on the core router, providing load balancing. This all works fine until we launch more than one guest with the same address on _one_ host. We create kernel multipath route that looks like this: 1.0.0.115 proto static metric 10 nexthop dev pub020304050612 weight 1 nexthop dev pub020304050616 weight 1 Note that there is no "via" address in the hop configulations! This actually works, i.e. connections originating from the host are balanced between those guests. But bird refuses to pick up such route because of the code here: https://gitlab.labs.nic.cz/labs/bird/blob/master/sysdep/linux/netlink.c#L528 Questions: 1. What was the justification for disallowing gateway-less multipath routes? Would it make sense to allow them (in the mainstream code)? 2. Would it be sufficient to simply drop the check for the presence of the gateway address in the message, and return `first` even if gateway address was not present? Thank you, Eugene
On Mon, Jan 21, 2019 at 06:38:21PM +0100, Eugene Crosser wrote:
Hello all,
Hello
we do virtual hosting, and we provide routeable /32 addresses to the guests. Kernel routes on the KVM host are link routes that look like this:
1.0.0.113 dev pub020304050612 proto static
They are picked up by bird and exported to the core router. When we launch multiple guests with the same address on multiple hosts, this results in an ECMP route on the core router, providing load balancing.
This all works fine until we launch more than one guest with the same address on _one_ host. We create kernel multipath route that looks like this:
1.0.0.115 proto static metric 10 nexthop dev pub020304050612 weight 1 nexthop dev pub020304050616 weight 1
Note that there is no "via" address in the hop configulations! This actually works, i.e. connections originating from the host are balanced between those guests. But bird refuses to pick up such route because of the code here:
https://gitlab.labs.nic.cz/labs/bird/blob/master/sysdep/linux/netlink.c#L528
Questions:
1. What was the justification for disallowing gateway-less multipath routes? Would it make sense to allow them (in the mainstream code)?
The code differentiated between gateway and gateway-less routes based on rta->dest (RTD_ROUTER for gateway, RTD_DEVICE for gateway-less). We extended that to have RTD_MULTIPATH, but there was no separate dest for each nexthop, so we restricted it to have all nexthops with gateways. Also, ECMP routes generated by protocols (e.g. OSPF) are always with nexthops, so it was generally not a big limitation. In BIRD 2.0, we unified this, replaced RTD_ROUTER / RTD_DEVICE / RTD_MULTIPATH with RTD_UNICAST, which can handle ECMP routes with mixed gateway and gatewa-less nexthops.
2. Would it be sufficient to simply drop the check for the presence of the gateway address in the message, and return `first` even if gateway address was not present?
Not sure what you mean by `first`. You cannot read RTA_GATEWAY field if there is none and you cannot call neigh_find2() for 0.0.0.0 address. You could set rv->gw to IPA_NONE, that would perhaps work in most cases, but it is untested. Or just switch to BIRD 2.0 -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 21/01/2019 21:20, Ondrej Zajicek wrote:
https://gitlab.labs.nic.cz/labs/bird/blob/master/sysdep/linux/netlink.c#L528
Questions:
1. What was the justification for disallowing gateway-less multipath routes? Would it make sense to allow them (in the mainstream code)?
The code differentiated between gateway and gateway-less routes based on rta->dest (RTD_ROUTER for gateway, RTD_DEVICE for gateway-less). We extended that to have RTD_MULTIPATH, but there was no separate dest for each nexthop, so we restricted it to have all nexthops with gateways. Also, ECMP routes generated by protocols (e.g. OSPF) are always with nexthops, so it was generally not a big limitation.
In BIRD 2.0, we unified this, replaced RTD_ROUTER / RTD_DEVICE / RTD_MULTIPATH with RTD_UNICAST, which can handle ECMP routes with mixed gateway and gatewa-less nexthops.
2. Would it be sufficient to simply drop the check for the presence of the gateway address in the message, and return `first` even if gateway address was not present?
Not sure what you mean by `first`. You cannot read RTA_GATEWAY field if
I was referring to the variable name in the code that I linked to. I was looking at the code in the master branch, I assumed that it is 2.x? The code _looks_ as if it will behave the same as in 1.6, but I did not try to run it.
there is none and you cannot call neigh_find2() for 0.0.0.0 address. You could set rv->gw to IPA_NONE, that would perhaps work in most cases, but it is untested.
Or just switch to BIRD 2.0
I will try that; I hope that I was wrong in my analysis. Thank you, Eugene
On Mon, Jan 21, 2019 at 10:01:54PM +0100, Eugene Crosser wrote:
On 21/01/2019 21:20, Ondrej Zajicek wrote:
https://gitlab.labs.nic.cz/labs/bird/blob/master/sysdep/linux/netlink.c#L528
Questions:
1. What was the justification for disallowing gateway-less multipath routes? Would it make sense to allow them (in the mainstream code)?
The code differentiated between gateway and gateway-less routes based on rta->dest (RTD_ROUTER for gateway, RTD_DEVICE for gateway-less). We extended that to have RTD_MULTIPATH, but there was no separate dest for each nexthop, so we restricted it to have all nexthops with gateways. Also, ECMP routes generated by protocols (e.g. OSPF) are always with nexthops, so it was generally not a big limitation.
In BIRD 2.0, we unified this, replaced RTD_ROUTER / RTD_DEVICE / RTD_MULTIPATH with RTD_UNICAST, which can handle ECMP routes with mixed gateway and gatewa-less nexthops.
2. Would it be sufficient to simply drop the check for the presence of the gateway address in the message, and return `first` even if gateway address was not present?
Not sure what you mean by `first`. You cannot read RTA_GATEWAY field if
I was referring to the variable name in the code that I linked to.
If you just returned 'first' without filling rv->gw, you would end with nexthop with undefined/random gateway address, as it was not properly set before. I suggesetd to replace 'return NULL;' in the else branch with 'rv->gw = IPA_NONE;'.
I was looking at the code in the master branch, I assumed that it is 2.x?
Not yet, see: https://bird.network.cz/pipermail/bird-users/2019-January/013006.html -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 1/21/19 10:01 PM, Eugene Crosser wrote:
On 21/01/2019 21:20, Ondrej Zajicek wrote: [...]
Or just switch to BIRD 2.0
I will try that; I hope that I was wrong in my analysis.
I've built a .deb package off the tag v2.0.3 and it indeed successfully imports gateway-less multipath routes from the kernel, and generally works a desired! Looks like the way to go for us. Are you aware of any plans to provide "official" debian package for the version 2? Thanks for your help! Eugene
On Tue, Jan 22, 2019 at 06:44:11PM +0100, Eugene Crosser wrote:
On 1/21/19 10:01 PM, Eugene Crosser wrote:
On 21/01/2019 21:20, Ondrej Zajicek wrote: [...]
Or just switch to BIRD 2.0
I will try that; I hope that I was wrong in my analysis.
I've built a .deb package off the tag v2.0.3 and it indeed successfully imports gateway-less multipath routes from the kernel, and generally works a desired! Looks like the way to go for us.
Are you aware of any plans to provide "official" debian package for the version 2?
Yes, we are working on it. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Eugene Crosser -
Ondrej Zajicek