[PATCH] babel: Keep separate auth PC counters for unicast and multicast
The babel protocol normally sends all its messages as multicast packets, but the protocol specification allows most messages to be sent as either unicast or multicast, and the two can be mixed freely. In particular, the babeld implementation can be configured to unicast updates to all peers instead of sending them as unicast. Daniel discovered that this can cause problems with the packet counter checks in the MAC extension due to packet reordering. This happens on WiFi networks where clients have power save enabled (which is quite common in infrastructure networks): in this case, the access point will buffer all multicast traffic and only send it out along with its beacons, leading to a maximum buffering in default Linux-based access point configuration of up to 200 ms. This means that a Babel sender that mixes unicast and multicast messages can have the unicast messages overtake the multicast messages because of this buffering; when authentication is enabled, this causes the receiver to discard the multicast message when it does arrive because it now has a packet counter value less than the unicast message that arrived before it. Daniel observed that this happens frequently enough that Babel ceases to work entirely when runner over a WiFi network. The issue has been described in draft-ietf-babel-mac-relaxed, which is currently pending RFC publication. That also describes two mitigation mechanisms: Keeping separate PC counters for unicast and multicast, and using a reorder window for PC values. This patch implements the former as that is the simplest, and resolves the particular issue seen on WiFi. Reported-by: Daniel Gröber <dxld@darkboxed.org> Tested-by: Daniel Gröber <dxld@darkboxed.org> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> --- proto/babel/babel.c | 28 ++++++++++++++++++++++------ proto/babel/babel.h | 4 +++- proto/babel/packets.c | 1 + 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/proto/babel/babel.c b/proto/babel/babel.c index ff8b6b52ef4a..499b45b5f18f 100644 --- a/proto/babel/babel.c +++ b/proto/babel/babel.c @@ -1543,7 +1543,8 @@ babel_auth_check_pc(struct babel_iface *ifa, struct babel_msg_auth *msg) n->auth_index_len = msg->index_len; memcpy(n->auth_index, msg->index, msg->index_len); - n->auth_pc = msg->pc; + n->auth_pc_unicast = msg->pc; + n->auth_pc_multicast = msg->pc; n->auth_passed = 1; return 1; @@ -1562,16 +1563,31 @@ babel_auth_check_pc(struct babel_iface *ifa, struct babel_msg_auth *msg) return 0; } - /* (6) Index matches; only accept if PC is greater than last */ - if (n->auth_pc >= msg->pc) + /* + * (6) Index matches; only accept if PC is greater than last. We keep separate + * counters for unicast and multicast because multicast packets can be delayed + * significantly on wireless networks (enough to be received out of order). + * Separate counters are safe because the packet destination address is part + * of the MAC pseudo-header (so unicast packets can't be replayed as multicast + * and vice versa). + */ + if ((msg->unicast && n->auth_pc_unicast >= msg->pc) || + (!msg->unicast && n->auth_pc_multicast >= msg->pc)) { LOG_PKT_AUTH("Authentication failed for %I on %s - " - "lower packet counter (rcv %u, old %u)", - msg->sender, ifa->ifname, msg->pc, n->auth_pc); + "lower %s packet counter (rcv %u, old %u)", + msg->sender, ifa->ifname, + msg->unicast ? "unicast" : "multicast", + msg->pc, + msg->unicast ? n->auth_pc_unicast : n->auth_pc_multicast); return 0; } - n->auth_pc = msg->pc; + if (msg->unicast) + n->auth_pc_unicast = msg->pc; + else + n->auth_pc_multicast = msg->pc; + n->auth_passed = 1; return 1; diff --git a/proto/babel/babel.h b/proto/babel/babel.h index da8386b3b76a..c76c82a63ef1 100644 --- a/proto/babel/babel.h +++ b/proto/babel/babel.h @@ -227,7 +227,8 @@ struct babel_neighbor { u16 next_hello_seqno; uint last_hello_int; - u32 auth_pc; + u32 auth_pc_unicast; + u32 auth_pc_multicast; u8 auth_passed; u8 auth_index_len; u8 auth_index[BABEL_AUTH_INDEX_LEN]; @@ -405,6 +406,7 @@ struct babel_msg_auth { u8 challenge_seen; u8 challenge_len; u8 challenge[BABEL_AUTH_MAX_NONCE_LEN]; + u8 unicast; }; static inline int babel_sadr_enabled(struct babel_proto *p) diff --git a/proto/babel/packets.c b/proto/babel/packets.c index d4acc17042d2..027da4429e4d 100644 --- a/proto/babel/packets.c +++ b/proto/babel/packets.c @@ -1655,6 +1655,7 @@ babel_read_pc(struct babel_tlv *hdr, union babel_msg *m UNUSED, state->auth.pc_seen = 1; state->auth.index_len = index_len; state->auth.index = tlv->index; + state->auth.unicast = state->is_unicast; state->current_tlv_endpos += index_len; return PARSE_SUCCESS; -- 2.39.1
On Tue, Jan 24, 2023 at 12:12:19AM +0100, Toke Høiland-Jørgensen via Bird-users wrote:
The issue has been described in draft-ietf-babel-mac-relaxed, which is currently pending RFC publication. That also describes two mitigation mechanisms: Keeping separate PC counters for unicast and multicast, and using a reorder window for PC values. This patch implements the former as that is the simplest, and resolves the particular issue seen on WiFi.
Hi Is that sufficient? In general, one should not assume anything about link frame ordering. Even two unicast (or two multicast) packets can be reordered due to e.g. frame retransmission. I think that simple sequence numbers work in two cases - if there is sufficient interval between packets, or there is only one packet flying (e.g. LSREQ-LSUPD ping-pong in OSPF). That is approach used in OSPFv2 and OSPFv3, but that is not true in Babel. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Tue, Jan 24, 2023 at 12:12:19AM +0100, Toke Høiland-Jørgensen via Bird-users wrote:
The issue has been described in draft-ietf-babel-mac-relaxed, which is currently pending RFC publication. That also describes two mitigation mechanisms: Keeping separate PC counters for unicast and multicast, and using a reorder window for PC values. This patch implements the former as that is the simplest, and resolves the particular issue seen on WiFi.
Hi
Is that sufficient? In general, one should not assume anything about link frame ordering. Even two unicast (or two multicast) packets can be reordered due to e.g. frame retransmission.
You're right, in principle. As noted in the paragraph you quoted above there are two possible mechanisms (unicast/multicast split and a reorder window). I initially implemented both, but some testers reported issues with the reorder window code. So I split it into two patches, with this one being the simplest.
I think that simple sequence numbers work in two cases - if there is sufficient interval between packets, or there is only one packet flying (e.g. LSREQ-LSUPD ping-pong in OSPF). That is approach used in OSPFv2 and OSPFv3, but that is not true in Babel.
Well, in practice it seems at least the Linux stack is pretty good at not reordering packets across a single hop, which means that this patch seems to fix the original issue that sparked this whole discussion. So I figured I'd send this one first to get the immediate issue on WiFi resolved, and follow up with the window tracking stuff later, once I've figured out why it breaks stuff... -Toke
On Tue, Jan 24, 2023 at 08:05:41PM +0100, Toke Høiland-Jørgensen wrote:
I think that simple sequence numbers work in two cases - if there is sufficient interval between packets, or there is only one packet flying (e.g. LSREQ-LSUPD ping-pong in OSPF). That is approach used in OSPFv2 and OSPFv3, but that is not true in Babel.
Well, in practice it seems at least the Linux stack is pretty good at not reordering packets across a single hop, which means that this patch seems to fix the original issue that sparked this whole discussion. So I figured I'd send this one first to get the immediate issue on WiFi resolved, and follow up with the window tracking stuff later, once I've figured out why it breaks stuff...
OK, i see that fixing the immediate issue is a good idea, will merge. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Tue, Jan 24, 2023 at 08:05:41PM +0100, Toke Høiland-Jørgensen wrote:
I think that simple sequence numbers work in two cases - if there is sufficient interval between packets, or there is only one packet flying (e.g. LSREQ-LSUPD ping-pong in OSPF). That is approach used in OSPFv2 and OSPFv3, but that is not true in Babel.
Well, in practice it seems at least the Linux stack is pretty good at not reordering packets across a single hop, which means that this patch seems to fix the original issue that sparked this whole discussion. So I figured I'd send this one first to get the immediate issue on WiFi resolved, and follow up with the window tracking stuff later, once I've figured out why it breaks stuff...
OK, i see that fixing the immediate issue is a good idea, will merge.
Great, thanks! -Toke
On Thu, Jan 26, 2023 at 11:42:33AM +0100, Toke Høiland-Jørgensen wrote:
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Tue, Jan 24, 2023 at 08:05:41PM +0100, Toke Høiland-Jørgensen wrote:
I think that simple sequence numbers work in two cases - if there is sufficient interval between packets, or there is only one packet flying (e.g. LSREQ-LSUPD ping-pong in OSPF). That is approach used in OSPFv2 and OSPFv3, but that is not true in Babel.
Well, in practice it seems at least the Linux stack is pretty good at not reordering packets across a single hop, which means that this patch seems to fix the original issue that sparked this whole discussion. So I figured I'd send this one first to get the immediate issue on WiFi resolved, and follow up with the window tracking stuff later, once I've figured out why it breaks stuff...
OK, i see that fixing the immediate issue is a good idea, will merge.
Great, thanks!
Hi Forgot about that, merged now: https://gitlab.nic.cz/labs/bird/-/commit/ee919658948772105d0bd3b4535ba288834... -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Thu, Jan 26, 2023 at 11:42:33AM +0100, Toke Høiland-Jørgensen wrote:
Ondrej Zajicek <santiago@crfreenet.org> writes:
On Tue, Jan 24, 2023 at 08:05:41PM +0100, Toke Høiland-Jørgensen wrote:
I think that simple sequence numbers work in two cases - if there is sufficient interval between packets, or there is only one packet flying (e.g. LSREQ-LSUPD ping-pong in OSPF). That is approach used in OSPFv2 and OSPFv3, but that is not true in Babel.
Well, in practice it seems at least the Linux stack is pretty good at not reordering packets across a single hop, which means that this patch seems to fix the original issue that sparked this whole discussion. So I figured I'd send this one first to get the immediate issue on WiFi resolved, and follow up with the window tracking stuff later, once I've figured out why it breaks stuff...
OK, i see that fixing the immediate issue is a good idea, will merge.
Great, thanks!
Hi
Forgot about that, merged now:
https://gitlab.nic.cz/labs/bird/-/commit/ee919658948772105d0bd3b4535ba288834...
Great, thanks! -Toke
The issue has been described in draft-ietf-babel-mac-relaxed, which is currently pending RFC publication. That also describes two mitigation mechanisms: Keeping separate PC counters for unicast and multicast, and using a reorder window for PC values. This patch implements the former as that is the simplest, and resolves the particular issue seen on WiFi.
Is that sufficient?
Babel is designed to work well even with very high rates of packet loss. If the link routinely reorders packets, the receiver will drop the reordered packets, but Babel will recover. Obviously, though, Babel cannot handle systematic packet loss (where the same packet is systematically lost even if it gets resent). The case that this mechanism aims to solve is the case where multicast packets are systematically delayed, and therefore dropped by the receiver, even if they get resent.
In general, one should not assume anything about link frame ordering. Even two unicast (or two multicast) packets can be reordered due to e.g. frame retransmission.
The packet that was retransmitted at the link layer will be dropped by the receiving node, then resent by the application layer.
I think that simple sequence numbers work in two cases - if there is sufficient interval between packets, or there is only one packet flying (e.g. LSREQ-LSUPD ping-pong in OSPF). That is approach used in OSPFv2 and OSPFv3, but that is not true in Babel.
Agreed: the occasional dropped packets may slow down reconvergence, but they won't prevent Babel from reconverging in the end. Unlike the systematic issue that this patch solves, and which may prevent convergence. If that's okay with you, Ondřej, I would recommend applying this patch, which is necessary for correctness, and worry about efficiency at some later time. -- Juliusz
participants (3)
-
Juliusz Chroboczek -
Ondrej Zajicek -
Toke Høiland-Jørgensen