Re: [Babel-users] [RFC] Replace WireGuard AllowedIPs with IP route attribute
Hi Kyle, On Mon, Aug 28, 2023 at 11:40:48AM -0400, Kyle Rose wrote:
On Sat, Aug 19, 2023 at 5:25 PM Daniel Gröber <dxld@darkboxed.org> wrote:
Having read Kyle's use-case I'm thinking my original plan to extend the wg internal source-address filtering to use a rt lookup with our new attribute would not be maximally useful so now my thinking is we should just have a boolean toggle to disable it explicitly per device.
If there is interest among the maintainers in eventually merging a change with a per-interface knob to turn off the source IP check, I will go through the trouble of putting together an initial pass at this. I don't want to spend the time if there is firm opposition to the idea.
I think just a patch to turn off the wg source IP check is not very useful at the moment. It would encourage bad source IP filtering practices when multiple peers are involved as no mechanism for identifying the sending peer is available at the policy routing or netfilter level currently. I think such a patch would have to get merged after some kind of mechanism to identify and filter based on the sending wg peer is available. So if you want to move this along I would suggest working on this first. Since I'm also interested in having this feature I'm happy collaborate. It's just hard to find the motivation for writing more wg patches when my pending ones have (mostly) been lying around for a year without a response, but if you're also keen on this feature I'm sure it's easier to stay motivated together :) If my kernel patches go ignored for too long too I'll probably just resort to getting a forked DKMS wireguard module into Debian with this work. Perhaps that approach (or a package in a different distro) would work for your use-case too? --Daniel
Daniel, Kyle, I've read the whole discussion, and I'm still not clear what advantages the proposed route attribute has over having one interface per peer. Is it because interfaces are expensive in the Linux kernel? Or is there some other reason why it is better to run all WG tunnels over a single interface? -- Juliusz
Hi Juliusz, On Mon, Aug 28, 2023 at 07:40:51PM +0200, Juliusz Chroboczek wrote:
I've read the whole discussion, and I'm still not clear what advantages the proposed route attribute has over having one interface per peer. Is it because interfaces are expensive in the Linux kernel? Or is there some other reason why it is better to run all WG tunnels over a single interface?
Off the top of my head UDP port exhaustion is a scalability concern here, just as an example, not that I'd actually ever need that many peers in my network :) One wg-device per-peer means we need one UDP port per-peer and since currently binding to a specific IP is also not supported by wg (I have a patch pending for this though) there's no good way to work around this. Frankly having tons of interfaces is just an operational PITA in all sorts of ways. Apart from the port exhaustion having more than one wg device also means I have to _allocate_ a new port for each node in my managment system somehow instead of just using a static port for the entire network. This gets dicy fast as I want to move in the direction of dynamic peering as in tinc. Other than that my `ip -br a` output is getting unmanagably long and having more than one device means I have to keep ACL lists in sync all over the system. This is a problem for daemons that don't support automatic reload (babeld for example :P). I also have to sync the set of interface to nftables which is easy to get wrong as it's still manual in my setup. All of that could be solved, but I would also like to get my wg+babel VPN setup deployed more widely at some point and all that friction isn't going to help with that so I'd rather have this supported properly. --Daniel
Hello! On 8/29/23 00:13, Daniel Gröber wrote:
On Mon, Aug 28, 2023 at 07:40:51PM +0200, Juliusz Chroboczek wrote:
I've read the whole discussion, and I'm still not clear what advantages the proposed route attribute has over having one interface per peer. Is it because interfaces are expensive in the Linux kernel? Or is there some other reason why it is better to run all WG tunnels over a single interface? Off the top of my head UDP port exhaustion is a scalability concern here,
For enterprise setups, this very easily _can_ get a scalability concern fairly easily.
One wg-device per-peer means we need one UDP port per-peer and since currently binding to a specific IP is also not supported by wg (I have a patch pending for this though) there's no good way to work around this. There is a theoretical frankenstein approach, running a virtual machine (maybe netns is enough) for each of the public IP address, and connect them by veth. You do not want to do this, but theoretically, it should work. Frankly having tons of interfaces is just an operational PITA in all sorts of ways. Apart from the port exhaustion having more than one wg device also means I have to _allocate_ a new port for each node in my managment system somehow instead of just using a static port for the entire network. This gets dicy fast as I want to move in the direction of dynamic peering as in tinc.
Even with my 6 machines running in weird locations, it's a mess.
All of that could be solved, but I would also like to get my wg+babel VPN setup deployed more widely at some point and all that friction isn't going to help with that so I'd rather have this supported properly.
All in all, I would also like to see this setup deployed worldwide. If we could somehow help on the BIRD side, please let us know. Thank you for bringing this up. -- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hello all, I heard recently about the lightweight tunnel infrastructure in Linux kernel (ip route ... encap ...). And I think this might be helpful in the context of this thread. Linux kernel allows already to add encapsulation parameters to the route entry in its table. So you do not need to create tunnel devices for that. And wireguard encapsulation and destination might be added there too. But as I understood the technology, it works only in one way (for outgoing packets) and the decapsulation should be processed separately, for example in case of VXLAN and MPLS they have their own tables. Regards, Alexander Zubkov Qrator Labs On Mon, Sep 11, 2023 at 5:46 PM Maria Matejka via Bird-users <bird-users@network.cz> wrote:
Hello!
On 8/29/23 00:13, Daniel Gröber wrote:
On Mon, Aug 28, 2023 at 07:40:51PM +0200, Juliusz Chroboczek wrote:
I've read the whole discussion, and I'm still not clear what advantages the proposed route attribute has over having one interface per peer. Is it because interfaces are expensive in the Linux kernel? Or is there some other reason why it is better to run all WG tunnels over a single interface?
Off the top of my head UDP port exhaustion is a scalability concern here,
For enterprise setups, this very easily _can_ get a scalability concern fairly easily.
One wg-device per-peer means we need one UDP port per-peer and since currently binding to a specific IP is also not supported by wg (I have a patch pending for this though) there's no good way to work around this.
There is a theoretical frankenstein approach, running a virtual machine (maybe netns is enough) for each of the public IP address, and connect them by veth. You do not want to do this, but theoretically, it should work.
Frankly having tons of interfaces is just an operational PITA in all sorts of ways. Apart from the port exhaustion having more than one wg device also means I have to _allocate_ a new port for each node in my managment system somehow instead of just using a static port for the entire network. This gets dicy fast as I want to move in the direction of dynamic peering as in tinc.
Even with my 6 machines running in weird locations, it's a mess.
All of that could be solved, but I would also like to get my wg+babel VPN setup deployed more widely at some point and all that friction isn't going to help with that so I'd rather have this supported properly.
All in all, I would also like to see this setup deployed worldwide. If we could somehow help on the BIRD side, please let us know.
Thank you for bringing this up.
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
Hi Alexander, On Thu, Nov 09, 2023 at 12:57:26PM +0100, Alexander Zubkov wrote:
I heard recently about the lightweight tunnel infrastructure in Linux kernel (ip route ... encap ...). And I think this might be helpful in the context of this thread.
I hadn't seen that yet, thanks for pointing it out.
Linux kernel allows already to add encapsulation parameters to the route entry in its table. So you do not need to create tunnel devices for that. And wireguard encapsulation and destination might be added there too.
Right, I think ultimately it's going to come down to either technical constraints or in the absence of that, maintainer preference whether via-wgpeer or "encap wg" is the way. The idea is very similar anyway.
But as I understood the technology, it works only in one way (for outgoing packets) and the decapsulation should be processed separately, for example in case of VXLAN and MPLS they have their own tables.
That would be a problem as I specifically want to tie the source address filtering to this too. I'll have a look at the internals (if and) when I get around to starting work on this. Thanks, --Daniel
On Sat, 18 Nov 2023, at 03:19, Daniel Gröber wrote:
Hi Alexander,
On Thu, Nov 09, 2023 at 12:57:26PM +0100, Alexander Zubkov wrote:
But as I understood the technology, it works only in one way (for outgoing packets) and the decapsulation should be processed separately, for example in case of VXLAN and MPLS they have their own tables.
That would be a problem as I specifically want to tie the source address filtering to this too. I'll have a look at the internals (if and) when I get around to starting work on this.
Is tying source address filtering to the routing table the right thing to do here? It seems to me that it would cause issues similar to those we see more generally with Unicast Reverse Path Filtering
Is tying source address filtering to the routing table the right thing to do here? It seems to me that it would cause issues similar to those we see more generally with Unicast Reverse Path Filtering
Issues are caused by the kernel performing filtering that the routing protocol is not aware of: it causes the routing daemon's routing table to no longer match the effective forwarding table (the kernel's routing table). That's the reason why uRPF breaks most routing protocols, that's the reason why we have trouble making Wireguard work with Babel, and also the reason behind https://github.com/jech/babeld/issues/111. Contrariwise, we can teach Babel to explicitly take into account the kernel features that we're interested in using. Thus, Babel could be aware of the restrictions placed on a wireguard interface, and collaborate with Wireguard so that the routing table and the forwarding table remain congruent. I haven't looked at the issue in detail, but I believe that would be an interesting (short-term) research project, one that I would be glad to collaborate with (but not necessarily lead, at least not right now). For the specific case of source address filtering, Babel already has an (implemented) extension to deal with source addresses, and I encourage you to consider whether it can be used to deal with the issue at hand. Please see https://arxiv.org/pdf/1403.0445.pdf and RFC 9079. -- Juliusz
Hi Erin, Juliusz, On Sat, Nov 18, 2023 at 11:21:57AM +0100, Erin Shepherd wrote:
On Sat, 18 Nov 2023, at 03:19, Daniel Gröber wrote:
That would be a problem as I specifically want to tie the source address filtering to this too. I'll have a look at the internals (if and) when I get around to starting work on this.
Is tying source address filtering to the routing table the right thing to do here? It seems to me that it would cause issues similar to those we see more generally with Unicast Reverse Path Filtering
IMO not providing a way to do source address filtering at the routing level was the original sin :) There is certianly the multihoming challange to be overcome as traditional BCP38 style filtering doesn't cut it in the general case. I have some ideas on how to deal with this. I've done some experiments and found that in Linux multi-nexthop routes actually match reverse path lookups (using nftables "rt") for _any_ of the source interfaces involved. I think this can be used to build RFC 3704 style Feasible Path Reverse Path Forwarding when the routing daemon involved supports ECMP. This experiment is what got me interested in having via-wgpeer in the routing table in the first place, once we have that we can apply the above idea not just at the interface level but at the wg peer level. Neat. Can you think of a use-case where fpRPF isn't enough? It's also noteworthy that once we have this support for via-wgpeer it'd be possible to apply ip-rule policy to the filtering decision. Perhaps that gives some additional power for more fun use-cases :) On Sat, Nov 18, 2023 at 01:22:03PM +0100, Juliusz Chroboczek wrote:
Issues are caused by the kernel performing filtering that the routing protocol is not aware of: it causes the routing daemon's routing table to no longer match the effective forwarding table (the kernel's routing table). That's the reason why uRPF breaks most routing protocols, that's the reason why we have trouble making Wireguard work with Babel, and also the reason behind https://github.com/jech/babeld/issues/111.
Right on the money as always. This idea has been on my mind too.
Contrariwise, we can teach Babel to explicitly take into account the kernel features that we're interested in using. Thus, Babel could be aware of the restrictions placed on a wireguard interface, and collaborate with Wireguard so that the routing table and the forwarding table remain congruent. I haven't looked at the issue in detail, but I believe that would be an interesting (short-term) research project, one that I would be glad to collaborate with (but not necessarily lead, at least not right now).
Sounds interesting do you have a funding source in mind?
For the specific case of source address filtering, Babel already has an (implemented) extension to deal with source addresses, and I encourage you to consider whether it can be used to deal with the issue at hand. Please see https://arxiv.org/pdf/1403.0445.pdf and RFC 9079.
I don't think I mentioned this to you yet, but I have another one of my crazy ideas of doing something vaguely similar to BGP flowspec with babel. Restricted to IP source/destination address, so no L4 stuff. I just want to represent firewall policy using ipv6 subtrees and distribute it in realtime using babel :) Unfortunately this is currently stalled due to an apparent nft rt match kernel bug preventing me from representing multiple possible outcomes since I want to support dropping, accepting but also stateful firewalling of matching flows. --Daniel
Hi Daniel, On Mon, Nov 20, 2023, 03:05 Daniel Gröber <dxld@darkboxed.org> wrote:
Hi Erin, Juliusz,
On Sat, Nov 18, 2023 at 11:21:57AM +0100, Erin Shepherd wrote:
On Sat, 18 Nov 2023, at 03:19, Daniel Gröber wrote:
That would be a problem as I specifically want to tie the source address filtering to this too. I'll have a look at the internals (if and) when I get around to starting work on this.
Is tying source address filtering to the routing table the right thing to do here? It seems to me that it would cause issues similar to those we see more generally with Unicast Reverse Path Filtering
IMO not providing a way to do source address filtering at the routing level was the original sin :)
There is certianly the multihoming challange to be overcome as traditional BCP38 style filtering doesn't cut it in the general case. I have some ideas on how to deal with this.
I've done some experiments and found that in Linux multi-nexthop routes actually match reverse path lookups (using nftables "rt") for _any_ of the source interfaces involved. I think this can be used to build RFC 3704 style Feasible Path Reverse Path Forwarding when the routing daemon involved supports ECMP.
This experiment is what got me interested in having via-wgpeer in the routing table in the first place, once we have that we can apply the above idea not just at the interface level but at the wg peer level. Neat.
Can you think of a use-case where fpRPF isn't enough?
Yes. IMHO, the problem with RPF is that routing table doesn't reflect the network topology, but only a subset of it. I mean in topologies where multiple pathes are possible, you can choose to use or even learn only a subset of those pathes. And that does not mean that there are no other legitimate pathes exist, that other actors may choose to reach you. In that sense might be yes, the original sin is that the routing table doesn't reflect all the topology, not only the pathes we choose for egress. Not sure though if it is a sin, in that case routing table would be too overcomplicated. If I understand correctly, such fpRPF approach works only if you both learn all possible pathes and use all of them in a multi-nexthop route. But for example in the Internet with its advanced BGP announcement policies it is not true at all. So from my point of view it is good to split the topology definition (ingress decapsulation) and the chosen pathes (egress routing). Because it is related, but still different processes. So the system can be more flexible. Although we need to repeat common things and keep ingress and egress consistent/synced. At the same time we can use single protocol/configuration as a source of information to setup both of those processes in cases when it is ok to sacrifice flexibility for simplicity. Or for example the ingress part can be configured to use routing table as a source of topology information. Actually, when we turn on the RPF, we do something like that. But my point is that RPF (with its variations too) has its bounds and cannot be a universal solution, there is no silver bullet here.
It's also noteworthy that once we have this support for via-wgpeer it'd be possible to apply ip-rule policy to the filtering decision. Perhaps that gives some additional power for more fun use-cases :)
On Sat, Nov 18, 2023 at 01:22:03PM +0100, Juliusz Chroboczek wrote:
Issues are caused by the kernel performing filtering that the routing protocol is not aware of: it causes the routing daemon's routing table to no longer match the effective forwarding table (the kernel's routing table). That's the reason why uRPF breaks most routing protocols, that's the reason why we have trouble making Wireguard work with Babel, and also the reason behind https://github.com/jech/babeld/issues/111.
Right on the money as always. This idea has been on my mind too.
Contrariwise, we can teach Babel to explicitly take into account the kernel features that we're interested in using. Thus, Babel could be aware of the restrictions placed on a wireguard interface, and collaborate with Wireguard so that the routing table and the forwarding table remain congruent. I haven't looked at the issue in detail, but I believe that would be an interesting (short-term) research project, one that I would be glad to collaborate with (but not necessarily lead, at least not right now).
Sounds interesting do you have a funding source in mind?
For the specific case of source address filtering, Babel already has an (implemented) extension to deal with source addresses, and I encourage you to consider whether it can be used to deal with the issue at hand. Please see https://arxiv.org/pdf/1403.0445.pdf and RFC 9079.
I don't think I mentioned this to you yet, but I have another one of my crazy ideas of doing something vaguely similar to BGP flowspec with babel. Restricted to IP source/destination address, so no L4 stuff. I just want to represent firewall policy using ipv6 subtrees and distribute it in realtime using babel :)
Unfortunately this is currently stalled due to an apparent nft rt match kernel bug preventing me from representing multiple possible outcomes since I want to support dropping, accepting but also stateful firewalling of matching flows.
--Daniel
Hi Alexander, On Wed, Nov 22, 2023 at 12:17:49AM +0100, Alexander Zubkov wrote:
Can you think of a use-case where fpRPF isn't enough?
Yes. IMHO, the problem with RPF is that routing table doesn't reflect the network topology, but only a subset of it.
Right that is the fundamental problem, so my solution to that is: routing should "just represent the full network topology" :) As the routing protocol sees it anyway, since the whole point of RFP is to only allow paths that the routing system chooses. Do note that while I implement the topology information using ECMP routes there's no reason you actually have to use ECMP. You could still have regular routes in your (main) routing table and use a separate table with ECMP routes for RPF and this is very much something I want us to support.
I mean in topologies where multiple pathes are possible, you can choose to use or even learn only a subset of those pathes.
If I undestand correctly you're talking about (local) routing daemon policy here. Yes this is something you can do and my current approach of (abusing) ECMP only works when your routing policy satisfies some symmetry criteria. However as Juliusz pointed out integrating this idea into the routing protocol proper could allow using arbitrary policy without ever breaking RPF, but figuring out the details is (exciting) future work.
In that sense might be yes, the original sin is that the routing table doesn't reflect all the topology, not only the pathes we choose for egress. Not sure though if it is a sin, in that case routing table would be too overcomplicated.
Right routing table (modification) performance and clutter is certainly a reason to forgoe this approach but I find that for the kind of (small) networks I want to run and that many people might run using wireguard this is perfectly fine.
If I understand correctly, such fpRPF approach works only if you both learn all possible pathes and use all of them in a multi-nexthop route. But for example in the Internet with its advanced BGP announcement policies it is not true at all.
Right to deply fpRPF on a large scale you really need some kind of support from the routing protocol. AFAIK there's nothing like that for BGP yet? I don't think it's completely inapplicable either though, might still work for iBGP with appropriately designed routing policy. My interest lies mostly with doing this using babel though.
So from my point of view it is good to split the topology definition (ingress decapsulation) and the chosen pathes (egress routing). Because it is related, but still different processes. So the system can be more flexible. Although we need to repeat common things and keep ingress and egress consistent/synced.
To me flexibility is only desirable insofar as it doesn't conflict with system security. Source address authenticity is an important property I wouldn't want to give up here. If it's easier to ignore source address filtering than it is to implement it nobody is going to do it (cf. the internet) and I think that's the crux of the problem with "encap". Wireguard gifted us this amazing state of source filtering being the easy default and I want to keep it that way.
my point is that RPF (with its variations too) has its bounds and cannot be a universal solution, there is no silver bullet here.
No, ofc. nothing we do can possibly "fix everything for everyone" but that's no reason not to try a new approach for a particular problem in a particular use-case :) --Daniel
participants (5)
-
Alexander Zubkov -
Daniel Gröber -
Erin Shepherd -
Juliusz Chroboczek -
Maria Matejka