PATCH: IO: Avoid calling SO_BINDTODEVICE if not needed
Since Linux 5.7 (see linux/c427bfec18f21) non-root users are allowed to bind a socket using SO_BINDTODEVICE as long as the socket is not already bound. When using BGP with VRFs, BIRD correctly binds the listening socket to the VRF but also re-binds the accept()'d socket to the same VRF. This is not needed as the interface bind is inherited in this case, and indeed this redundant bind causes an -EPERM if BIRD is running as non-root making BIRD close the connection and reject the peer. We change the behaviour of the generic sk_setup to first query the socket and see if the socket is already correctly bound, and call setsockopt(SO_BINDTODEVICE) iff it is truly needed. In addition, since the getsockopt(SO_BINDTODEVICE) was implemented in Linux 3.8 or otherwise might be blocked in existing installations, we quietly fall back to the previous behavior if the getsockopt call fails. Test case: Run BIRD as a non-root user (and no extra capabilities) using passive BGP inside a VRF. Before the patch observe the error: "<ERR> SOCK: Incoming connection: SO_BINDTODEVICE: Operation not permitted" protocol bgp AS1234_1 { [..] vrf "VrfTest"; passive on; } After the patch this works as expected. Patch is attached to this message but if it falls off it can also be found at: https://github.com/sonix-network/bird/blob/33a0ac4b5af38d3bf75c78ca62472fff1... There is also a simple utility to test the behavior of rebinding sockets at: https://github.com/sonix-network/bird/blob/33a0ac4b5af38d3bf75c78ca62472fff1... Thanks for your consideration,
Hi Christian and all! I wonder if it is necessary at all to set a vrf on an accepted connection? It seems to me that setting or checking vrf should be avoided instead for an accepted connection. What do you think? On Sat, Jul 27, 2024, 11:54 Christian Svensson via Bird-users < bird-users@network.cz> wrote:
Since Linux 5.7 (see linux/c427bfec18f21) non-root users are allowed to bind a socket using SO_BINDTODEVICE as long as the socket is not already bound.
When using BGP with VRFs, BIRD correctly binds the listening socket to the VRF but also re-binds the accept()'d socket to the same VRF. This is not needed as the interface bind is inherited in this case, and indeed this redundant bind causes an -EPERM if BIRD is running as non-root making BIRD close the connection and reject the peer.
We change the behaviour of the generic sk_setup to first query the socket and see if the socket is already correctly bound, and call setsockopt(SO_BINDTODEVICE) iff it is truly needed. In addition, since the getsockopt(SO_BINDTODEVICE) was implemented in Linux 3.8 or otherwise might be blocked in existing installations, we quietly fall back to the previous behavior if the getsockopt call fails.
Test case: Run BIRD as a non-root user (and no extra capabilities) using passive BGP inside a VRF. Before the patch observe the error: "<ERR> SOCK: Incoming connection: SO_BINDTODEVICE: Operation not permitted"
protocol bgp AS1234_1 { [..] vrf "VrfTest"; passive on; }
After the patch this works as expected.
Patch is attached to this message but if it falls off it can also be found at:
https://github.com/sonix-network/bird/blob/33a0ac4b5af38d3bf75c78ca62472fff1...
There is also a simple utility to test the behavior of rebinding sockets at:
https://github.com/sonix-network/bird/blob/33a0ac4b5af38d3bf75c78ca62472fff1...
Thanks for your consideration,
Hi Alexander, On Sat, Jul 27, 2024 at 2:05 PM Alexander Zubkov <green@qrator.net> wrote:
I wonder if it is necessary at all to set a vrf on an accepted connection? It seems to me that setting or checking vrf should be avoided instead for an accepted connection. What do you think?
Indeed, this is what I set out to do in the beginning and is, if you boil this patch down, the actual implication when using VRFs. The reason I chose to implement the patch as a get+set rather than a conditional set was that the existing code structure assumes that sk_setup is called on multiple types of sockets and I wasn't sure exactly how to guard for specifically sockets that are connected. In addition I tried to find a reference in the kernel to where exactly it inherits the bound interface when a new socket is created from accept() but I could not. It is evident from my experiments that it is inherited, and that is the only way accept() on a VRF bind() would make sense. Doing a get+set seems like the least risky change that I felt safe to propose. That said, if you believe it is better implemented as a conditional and are able to nudge me how you'd want a check for the particular socket type to look, I'd be happy to do a v2 patch. Regards,
I'm not a BIRD developer, so I cannot decide hare what is the right direction for you to implement the patch. But it seems to me that it is the question that should be raised. As for implementation, I'm not sure, but from a quick glance at a code, it seem that vrf binding might be moved from sk_setup somewhere else. On Sat, Jul 27, 2024 at 2:18 PM Christian Svensson <christian@cmd.nu> wrote:
Hi Alexander,
On Sat, Jul 27, 2024 at 2:05 PM Alexander Zubkov <green@qrator.net> wrote:
I wonder if it is necessary at all to set a vrf on an accepted connection? It seems to me that setting or checking vrf should be avoided instead for an accepted connection. What do you think?
Indeed, this is what I set out to do in the beginning and is, if you boil this patch down, the actual implication when using VRFs.
The reason I chose to implement the patch as a get+set rather than a conditional set was that the existing code structure assumes that sk_setup is called on multiple types of sockets and I wasn't sure exactly how to guard for specifically sockets that are connected. In addition I tried to find a reference in the kernel to where exactly it inherits the bound interface when a new socket is created from accept() but I could not. It is evident from my experiments that it is inherited, and that is the only way accept() on a VRF bind() would make sense. Doing a get+set seems like the least risky change that I felt safe to propose.
That said, if you believe it is better implemented as a conditional and are able to nudge me how you'd want a check for the particular socket type to look, I'd be happy to do a v2 patch.
Regards,
Hi Thanks for the patch. I agree that inheriting the iface-bind is the only way accept() on a VRF-bound scoket would make sense. I guess this ineriting behavior is true also for other socket properties, Therefore, sk_setup() likely does plenty of useless steps for sockets associated with incoming TCP connections. I would prefer the conditional set, you can just check (s->type == SK_TCP), (Listening socket has type SK_TCP_PASSIVE, while connecting socket has type SK_TCP_ACTIVE at sk_setup() time). Also, you noted that this is for BIRD running as non-root with no specific capability. Do you have some experiences and suggestions about running BIRD this way? I guess there are many other syscalls in BIRD that are CAP_NET_ADMIN privileged, like setsockopt(SO_DONTROUTE). On Sat, Jul 27, 2024 at 02:18:20PM +0200, Christian Svensson via Bird-users wrote:
Hi Alexander,
On Sat, Jul 27, 2024 at 2:05 PM Alexander Zubkov <green@qrator.net> wrote:
I wonder if it is necessary at all to set a vrf on an accepted connection? It seems to me that setting or checking vrf should be avoided instead for an accepted connection. What do you think?
Indeed, this is what I set out to do in the beginning and is, if you boil this patch down, the actual implication when using VRFs.
The reason I chose to implement the patch as a get+set rather than a conditional set was that the existing code structure assumes that sk_setup is called on multiple types of sockets and I wasn't sure exactly how to guard for specifically sockets that are connected. In addition I tried to find a reference in the kernel to where exactly it inherits the bound interface when a new socket is created from accept() but I could not. It is evident from my experiments that it is inherited, and that is the only way accept() on a VRF bind() would make sense. Doing a get+set seems like the least risky change that I felt safe to propose.
That said, if you believe it is better implemented as a conditional and are able to nudge me how you'd want a check for the particular socket type to look, I'd be happy to do a v2 patch.
Regards,
-- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
Hi Ondrej, What do you think about splitting sk_setup() into different flavours? See the example patch attached. This approach may be more beneficial, because it does not count on different internal socket types, but uses direct logic of calling specific initialization function depending how the socket is obtained. Regards, Alexander On Mon, Jul 29, 2024 at 4:25 AM Ondrej Zajicek <santiago@crfreenet.org> wrote:
Hi
Thanks for the patch. I agree that inheriting the iface-bind is the only way accept() on a VRF-bound scoket would make sense. I guess this ineriting behavior is true also for other socket properties, Therefore, sk_setup() likely does plenty of useless steps for sockets associated with incoming TCP connections.
I would prefer the conditional set, you can just check (s->type == SK_TCP), (Listening socket has type SK_TCP_PASSIVE, while connecting socket has type SK_TCP_ACTIVE at sk_setup() time).
Also, you noted that this is for BIRD running as non-root with no specific capability. Do you have some experiences and suggestions about running BIRD this way? I guess there are many other syscalls in BIRD that are CAP_NET_ADMIN privileged, like setsockopt(SO_DONTROUTE).
On Sat, Jul 27, 2024 at 02:18:20PM +0200, Christian Svensson via Bird-users wrote:
Hi Alexander,
On Sat, Jul 27, 2024 at 2:05 PM Alexander Zubkov <green@qrator.net> wrote:
I wonder if it is necessary at all to set a vrf on an accepted connection? It seems to me that setting or checking vrf should be avoided instead for an accepted connection. What do you think?
Indeed, this is what I set out to do in the beginning and is, if you boil this patch down, the actual implication when using VRFs.
The reason I chose to implement the patch as a get+set rather than a conditional set was that the existing code structure assumes that sk_setup is called on multiple types of sockets and I wasn't sure exactly how to guard for specifically sockets that are connected. In addition I tried to find a reference in the kernel to where exactly it inherits the bound interface when a new socket is created from accept() but I could not. It is evident from my experiments that it is inherited, and that is the only way accept() on a VRF bind() would make sense. Doing a get+set seems like the least risky change that I felt safe to propose.
That said, if you believe it is better implemented as a conditional and are able to nudge me how you'd want a check for the particular socket type to look, I'd be happy to do a v2 patch.
Regards,
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
On Tue, Jul 30, 2024 at 12:33:30AM +0200, Alexander Zubkov wrote:
Hi Ondrej,
What do you think about splitting sk_setup() into different flavours? See the example patch attached. This approach may be more beneficial, because it does not count on different internal socket types, but uses direct logic of calling specific initialization function depending how the socket is obtained.
Hi That is a good point, but in some way sk_setup() is already the common function, while the specific code (like bind() to address and associated calls) are directly in sk_open(). Adding another layer of indirection would complicate it even more and would force reordering of syscalls if some of these are moved from generic to specific. I thought about just adding an argument to sk_setup() based on how the socked is obtained, so it can do the conditional operations internally, but finally i chose the minimal change: https://gitlab.nic.cz/labs/bird/-/commit/df22b3140cad6bd8742dce16e6a1b342d4a... (I do not like how this code is structured anyways, especially split of syscalls between sk_open() and sk_setup(), so perhaps this will be restructured later in some more sensible manner.) -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
Hi, Yes, you are right, it makes another unneeded level of indirection. Because sk_setup() is called from only 2 paces now - sk_open() and sk_passive_connected(). Those are already specific initialization functions. And there is no reason to add additional layer, when VRF code could be moved directly to sk_open(). But the fix is there already. And in some sence the bird's socket type is a non-explicit argument to sk_setup(), that idetifies how socket is obtained. I only have doubts if SK_UNIX should be there too, because it is also obtained by accept(). But those are fs-bound, and I'm not sure that VRFs work for them at all somehow. On Tue, Jul 30, 2024 at 5:03 PM Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Tue, Jul 30, 2024 at 12:33:30AM +0200, Alexander Zubkov wrote:
Hi Ondrej,
What do you think about splitting sk_setup() into different flavours? See the example patch attached. This approach may be more beneficial, because it does not count on different internal socket types, but uses direct logic of calling specific initialization function depending how the socket is obtained.
Hi
That is a good point, but in some way sk_setup() is already the common function, while the specific code (like bind() to address and associated calls) are directly in sk_open(). Adding another layer of indirection would complicate it even more and would force reordering of syscalls if some of these are moved from generic to specific.
I thought about just adding an argument to sk_setup() based on how the socked is obtained, so it can do the conditional operations internally, but finally i chose the minimal change:
https://gitlab.nic.cz/labs/bird/-/commit/df22b3140cad6bd8742dce16e6a1b342d4a...
(I do not like how this code is structured anyways, especially split of syscalls between sk_open() and sk_setup(), so perhaps this will be restructured later in some more sensible manner.)
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
participants (3)
-
Alexander Zubkov -
Christian Svensson -
Ondrej Zajicek