[PATCH] krt: Dump routing tables separetely on linux to avoid congestion
When dumping the routing table bird currently doesn't set the rtm_table netlink field to select any particular one but rather wants to get all at once. This can be problematic when multiple routing daemons are running on a system as the kernel's route modification performance goes down drasticly (by a factor of 20-200ish) when the table is being modified while it's being dumped. To avoid this situation we make bird do dumps on a per-kernel-table basis. This then allows the administrator to have multiple routing daemons use different kernel tables which sidesteps the problem. See also this discussion on the babel-users mailing list: https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003902.html --- sysdep/cf/linux.h | 2 +- sysdep/linux/netlink.c | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/sysdep/cf/linux.h b/sysdep/cf/linux.h index 047d3764..d6211d16 100644 --- a/sysdep/cf/linux.h +++ b/sysdep/cf/linux.h @@ -9,7 +9,7 @@ #define CONFIG_AUTO_ROUTES #define CONFIG_SELF_CONSCIOUS #define CONFIG_MULTIPLE_TABLES -#define CONFIG_ALL_TABLES_AT_ONCE +#undef CONFIG_ALL_TABLES_AT_ONCE #define CONFIG_IP6_SADR_KERNEL #define CONFIG_MC_PROPER_SRC diff --git a/sysdep/linux/netlink.c b/sysdep/linux/netlink.c index 29b744cb..003b7939 100644 --- a/sysdep/linux/netlink.c +++ b/sysdep/linux/netlink.c @@ -256,7 +256,7 @@ nl_request_dump_addr(int af) } static void -nl_request_dump_route(int af) +nl_request_dump_route(int af, int table_id) { struct { struct nlmsghdr nh; @@ -267,6 +267,7 @@ nl_request_dump_route(int af) .nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP, .nh.nlmsg_seq = ++(nl_scan.seq), .rtm.rtm_family = af, + .rtm.rtm_table = table_id, }; send(nl_scan.fd, &req, sizeof(req), 0); @@ -1976,13 +1977,13 @@ nl_parse_route(struct nl_parse_state *s, struct nlmsghdr *h) } void -krt_do_scan(struct krt_proto *p UNUSED) /* CONFIG_ALL_TABLES_AT_ONCE => p is NULL */ +krt_do_scan(struct krt_proto *p) { struct nlmsghdr *h; struct nl_parse_state s; nl_parse_begin(&s, 1); - nl_request_dump_route(AF_UNSPEC); + nl_request_dump_route(AF_UNSPEC, KRT_CF->sys.table_id); while (h = nl_get_scan()) if (h->nlmsg_type == RTM_NEWROUTE || h->nlmsg_type == RTM_DELROUTE) nl_parse_route(&s, h); -- 2.30.2
Daniel Gröber <dxld@darkboxed.org> writes:
When dumping the routing table bird currently doesn't set the rtm_table netlink field to select any particular one but rather wants to get all at once.
This can be problematic when multiple routing daemons are running on a system as the kernel's route modification performance goes down drasticly (by a factor of 20-200ish) when the table is being modified while it's being dumped.
To avoid this situation we make bird do dumps on a per-kernel-table basis. This then allows the administrator to have multiple routing daemons use different kernel tables which sidesteps the problem.
See also this discussion on the babel-users mailing list: https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003902.html
Note that this only works with the strict netlink filter checking is enabled (i.e., if the setsockopt for NETLINK_GET_STRICT_CHK succeeds). Bird currently doesn't check this at runtime at all, so just applying this patch as-is will not work correctly on older kernels (<4.20). One option could be to add a compile-time check for the presence of the NETLINK_GET_STRICT_CHK definition in system headers, and set CONFIG_ALL_TABLES_AT_ONCE based on whether it is present. This should do the right thing in most cases, and even if it fails (i.e., if bird is compiled on a newer system than it will run on), this will only lead to redundant scans, not missed routes, so this could be an acceptable trade-off? Otherwise, CONFIG_ALL_TABLES_AT_ONCE would have to be made a run-time parameter set based on whether the setsockopt call succeeds... -Toke
On Sat, Apr 16, 2022 at 07:28:59PM +0200, Toke Høiland-Jørgensen wrote:
Daniel Gröber <dxld@darkboxed.org> writes:
When dumping the routing table bird currently doesn't set the rtm_table netlink field to select any particular one but rather wants to get all at once.
This can be problematic when multiple routing daemons are running on a system as the kernel's route modification performance goes down drasticly (by a factor of 20-200ish) when the table is being modified while it's being dumped.
To avoid this situation we make bird do dumps on a per-kernel-table basis. This then allows the administrator to have multiple routing daemons use different kernel tables which sidesteps the problem.
See also this discussion on the babel-users mailing list: https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003902.html
Note that this only works with the strict netlink filter checking is enabled (i.e., if the setsockopt for NETLINK_GET_STRICT_CHK succeeds). Bird currently doesn't check this at runtime at all, so just applying this patch as-is will not work correctly on older kernels (<4.20).
Hi I like the idea of scanning tables independently, but i suspected there would be an issue on older kernels without NETLINK_GET_STRICT_CHK. I will check how hard it would be to switch CONFIG_ALL_TABLES_AT_ONCE in run-time. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Sat, Apr 16, 2022 at 07:28:59PM +0200, Toke Høiland-Jørgensen wrote:
Daniel Gröber <dxld@darkboxed.org> writes:
When dumping the routing table bird currently doesn't set the rtm_table netlink field to select any particular one but rather wants to get all at once.
This can be problematic when multiple routing daemons are running on a system as the kernel's route modification performance goes down drasticly (by a factor of 20-200ish) when the table is being modified while it's being dumped.
To avoid this situation we make bird do dumps on a per-kernel-table basis. This then allows the administrator to have multiple routing daemons use different kernel tables which sidesteps the problem.
See also this discussion on the babel-users mailing list: https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003902.html
Note that this only works with the strict netlink filter checking is enabled (i.e., if the setsockopt for NETLINK_GET_STRICT_CHK succeeds). Bird currently doesn't check this at runtime at all, so just applying this patch as-is will not work correctly on older kernels (<4.20).
Hi
I like the idea of scanning tables independently, but i suspected there would be an issue on older kernels without NETLINK_GET_STRICT_CHK. I will check how hard it would be to switch CONFIG_ALL_TABLES_AT_ONCE in run-time.
The success/failure of the NETLINK_GET_STRICT_CHECK should be a predictor of whether the filtering will work; this is from the commit message that added the flag: For new userspace on old kernel, the setsockopt will fail and even if new userspace sets data in the headers and appended attributes the kernel will silently ignore it. Moving forward when the setsockopt succeeds, the new userspace on old kernel means the dump request can pass an attribute the kernel does not understand. The dump will then fail as the older kernel does not understand it. New userspace on new kernel setting the socket option gets the benefit of the improved data dump. If switching from compile-time to runtime turns out to be too annoying, just switching unconditionally to the per-table dump would technically work (as in, would not drop any routes), it would just be inefficient as each "per-table" dump would return results from all tables... -Toke
On Tue, Apr 19, 2022 at 04:06:57PM +0200, Toke Høiland-Jørgensen wrote:
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Sat, Apr 16, 2022 at 07:28:59PM +0200, Toke Høiland-Jørgensen wrote:
Daniel Gröber <dxld@darkboxed.org> writes:
When dumping the routing table bird currently doesn't set the rtm_table netlink field to select any particular one but rather wants to get all at once.
This can be problematic when multiple routing daemons are running on a system as the kernel's route modification performance goes down drasticly (by a factor of 20-200ish) when the table is being modified while it's being dumped.
To avoid this situation we make bird do dumps on a per-kernel-table basis. This then allows the administrator to have multiple routing daemons use different kernel tables which sidesteps the problem.
See also this discussion on the babel-users mailing list: https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003902.html
Note that this only works with the strict netlink filter checking is enabled (i.e., if the setsockopt for NETLINK_GET_STRICT_CHK succeeds). Bird currently doesn't check this at runtime at all, so just applying this patch as-is will not work correctly on older kernels (<4.20).
I like the idea of scanning tables independently, but i suspected there would be an issue on older kernels without NETLINK_GET_STRICT_CHK. I will check how hard it would be to switch CONFIG_ALL_TABLES_AT_ONCE in run-time.
The success/failure of the NETLINK_GET_STRICT_CHECK should be a predictor of whether the filtering will work; this is from the commit message that added the flag:
Hello Finally i got around to finishing it: https://gitlab.nic.cz/labs/bird/-/commit/534d0a4b44aa193da785ae180475a448f57... It switches CONFIG_ALL_TABLES_AT_ONCE behavior run-time based on success / failure of the setsockopt for NETLINK_GET_STRICT_CHECK. Seems to work correctly on both newer and older systems. There were also some minor issues in the original patch, like not handling table_id > 255, and not setting proper address family for scan. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (3)
-
Daniel Gröber -
Ondrej Zajicek -
Toke Høiland-Jørgensen