Hi folks, I’m running bird6 1.4.5-1 (from Debian jessie) on my home network. Quite regularly I see floods of "Received bad lsa checksum” message in my syslog on the two routers that run BIRD, which causes connectivity interruptions. The network is made up of a variety of devices running different OSPF stacks. The network is fully dual-stacked with the same devices running OSPFv2 and OSPFv3 with a mostly identical configuration. The network contains two Quagga (0.99.23.1-1 from Debian Jessie) routers, one Dell PowerConnect 8024F (fw 5.1.7.5), and several MikroTik RouterOS (6.28) routers. The problem *only* occurs on OSPFv3; OSPFv2 is unaffected. The messages always follow a very specific pattern. Here are the last 10 received on one of my routers: Apr 25 16:17:36 clemens bird6: Received bad lsa checksum from fe80::290:7fff:fe3c:502a: de00 deff Apr 25 16:17:37 clemens bird6: Received bad lsa checksum from fe80::5e26:aff:feb8:34ad: de00 deff Apr 25 16:17:41 clemens bird6: Received bad lsa checksum from fe80::290:7fff:fe3c:502a: de00 deff Apr 25 16:17:42 clemens bird6: Received bad lsa checksum from fe80::5e26:aff:feb8:34ad: de00 deff Apr 25 16:17:46 clemens bird6: Received bad lsa checksum from fe80::290:7fff:fe3c:502a: de00 deff Apr 25 16:17:47 clemens bird6: Received bad lsa checksum from fe80::5e26:aff:feb8:34ad: de00 deff Apr 25 16:17:51 clemens bird6: Received bad lsa checksum from fe80::290:7fff:fe3c:502a: de00 deff Apr 25 16:17:52 clemens bird6: Received bad lsa checksum from fe80::5e26:aff:feb8:34ad: de00 deff Apr 25 16:17:56 clemens bird6: Received bad lsa checksum from fe80::290:7fff:fe3c:502a: de00 deff Apr 25 16:18:01 clemens bird6: Received bad lsa checksum from fe80::290:7fff:fe3c:502a: de00 deff If I grep my logs looking for unique entries, I see the following: Received bad lsa checksum from $PEER: 1800 18ff Received bad lsa checksum from $PEER: 30 ff30 Received bad lsa checksum from $PEER: 3500 35ff Received bad lsa checksum from $PEER: 40 ff40 Received bad lsa checksum from $PEER: 5000 50ff Received bad lsa checksum from $PEER: 6600 66ff Received bad lsa checksum from $PEER: 6e00 6eff Received bad lsa checksum from $PEER: 86 ff86 Received bad lsa checksum from $PEER: 9800 98ff Received bad lsa checksum from $PEER: 9c00 9cff Received bad lsa checksum from $PEER: 9e ff9e Received bad lsa checksum from $PEER: 9f ff9f Received bad lsa checksum from $PEER: bc ffbc Received bad lsa checksum from $PEER: cc ffcc Received bad lsa checksum from $PEER: cd ffcd Received bad lsa checksum from $PEER: cf00 cfff Received bad lsa checksum from $PEER: dc ffdc Received bad lsa checksum from $PEER: dd ffdd Received bad lsa checksum from $PEER: de00 deff Received bad lsa checksum from $PEER: e600 e6ff Received bad lsa checksum from $PEER: ee00 eeff Received bad lsa checksum from $PEER: fe fffe Thus, the problem occurs when (AIUI) the checksum sent by the peer has a zero octet, which doesn’t agree with how lsasum_check() calculates the checksum where the octet is always 0xff instead. This happens either when the first octet is zero or the second octet is zero. Looking through my logs, it seems I have yet to run into a situation where the values are reversed, e.g. the checksum in the LSA always includes a zero octet and BIRD always calculates 0xff, never the other way around. I have yet to look in much detail whether the problem is likely to be in BIRD’s lsasum_check() implementation or in the checksum generated by one of my other devices, and I don’t yet know which device(s) generate the LSAs that BIRD complains about. Has anyone here run into this before? Thanks, Chris -- Chris Boot bootc@bootc.net
Hi folks, On 25 Apr 2015, at 16:35, Chris Boot <bootc@bootc.net> wrote:
Hi folks,
I’m running bird6 1.4.5-1 (from Debian jessie) on my home network. Quite regularly I see floods of "Received bad lsa checksum” message in my syslog on the two routers that run BIRD, which causes connectivity interruptions.
So I did some of my own debugging and worked out what’s happening. Here is a captured LSA from my network: 00 4f 20 02 00 00 01 15 51 bb 37 66 80 00 01 36 00 c3 00 28 00 00 00 13 51 bb 37 66 0a 00 01 80 51 bb e5 a4 51 bb e5 a5 The checksum in the LSA is 00c3, which is at offsets 16 and 17. This LSA comes from my Dell PowerConnect 8024F. I’m not sure if this is valid or not; my reading of RFC905 Annex B suggests that it’s valid for either of the octets to be zero, but not both. I *think* that it’s an implementation detail shared with both Quagga and BIRD that the value 0 can never be generated for ‘x’, which comes from the RFC1008 reference implementation. The problem happens because BIRD doesn’t verify the checksum by running Fletcher against the received packet and testing for zero; rather it: - sets the checksum in the packet to 0 - runs the checksum calculation on the LSA data - updates the checksum in the packet (and comes up with a different result) - complains when the result is not the same as what was in the packet I believe the correct way to verify the checksum would be to run the algorithm against the unchanged packet (with the checksum intact) and verify that the result is zero. Would a patch that changes this behaviour be welcome? I’ve written something up against the current git master, but haven’t yet tested it. Cheers, Chris -- Chris Boot bootc@bootc.net
On Sat, Apr 25, 2015 at 08:54:13PM +0100, Chris Boot wrote:
The problem happens because BIRD doesn’t verify the checksum by running Fletcher against the received packet and testing for zero; rather it: - sets the checksum in the packet to 0 - runs the checksum calculation on the LSA data - updates the checksum in the packet (and comes up with a different result) - complains when the result is not the same as what was in the packet
I believe the correct way to verify the checksum would be to run the algorithm against the unchanged packet (with the checksum intact) and verify that the result is zero.
Would a patch that changes this behaviour be welcome? I’ve written something up against the current git master, but haven’t yet tested it.
Hello You are most likley right. As i understand it, these checksums are modulo 255; therefore, in some cases they are not unique. Therefore simple computing new value and comparing it with received does not work. A patch would be definitely welcome. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Prior to this patch, BIRD validates the OSPF LSA checksum by calculating a new checksum and comparing it with the checksum in the header. Due to the specifics of the Fletcher checksum used in OSPF, this is not necessarily correct as the checkbytes in the header may be calculated via a different means and end up with a different value that is nonetheless still correct. The documented means of validating the checksum as specified in RFC 905 B.4 is to calculate c0 and c1 from the unchanged contents of the packet, which must result in a zero value to be considered valid. Signed-off-by: Chris Boot <bootc@bootc.net> --- proto/ospf/lsalib.c | 28 +++++++++++++++++++++++----- proto/ospf/lsalib.h | 2 +- proto/ospf/lsupd.c | 2 +- 3 files changed, 25 insertions(+), 7 deletions(-) diff --git a/proto/ospf/lsalib.c b/proto/ospf/lsalib.c index 579d13e88770..ce6fb17813e1 100644 --- a/proto/ospf/lsalib.c +++ b/proto/ospf/lsalib.c @@ -205,7 +205,7 @@ lsasum_calculate(struct ospf_lsa_header *h, void *body) buf_dump("CALC", buf, length); */ - (void) lsasum_check(h, body); + (void) lsasum_check(h, body, 1); // log(L_WARN "Checksum result %4x", h->checksum); @@ -214,11 +214,21 @@ lsasum_calculate(struct ospf_lsa_header *h, void *body) } /* - * Note, that this function expects that LSA is in big endianity - * It also returns value in big endian + * Calculates the Fletcher checksum of an OSPF LSA. + * + * If 'update' is non-zero, the checkbytes (X and Y in RFC905) are calculated + * and the checksum field in the header is updated. The return value is the + * checksum as placed in the header (in network byte order). + * + * If 'update' is zero, only C0 and C1 are calculated and the header is kept + * intact. The return value is a combination of C0 and C1; if the return value + * is exactly zero the checksum is considered valid, any non-zero value is + * invalid. + * + * Note that this function expects the input LSA to be in network byte order. */ u16 -lsasum_check(struct ospf_lsa_header *h, void *body) +lsasum_check(struct ospf_lsa_header *h, void *body, int update) { u8 *sp, *ep, *p, *q, *b; int c0 = 0, c1 = 0; @@ -229,7 +239,7 @@ lsasum_check(struct ospf_lsa_header *h, void *body) sp = (char *) h; sp += 2; /* Skip Age field */ length = ntohs(h->length) - 2; - h->checksum = 0; + if (update) h->checksum = 0; for (ep = sp + length; sp < ep; sp = q) { /* Actually MODX is very large, do we need the for-cyclus? */ @@ -259,6 +269,14 @@ lsasum_check(struct ospf_lsa_header *h, void *body) c1 %= 255; } + if (!update) { + /* + * When testing the checksum, we don't need to calculate x and y. The + * checksum passes if c0 and c1 are both 0. + */ + return (c0 << 8) | (c1 & 0xff); + } + x = (int)((length - LSA_CHECKSUM_OFFSET) * c0 - c1) % 255; if (x <= 0) x += 255; diff --git a/proto/ospf/lsalib.h b/proto/ospf/lsalib.h index d9e1a610dbda..4ad770e84812 100644 --- a/proto/ospf/lsalib.h +++ b/proto/ospf/lsalib.h @@ -48,7 +48,7 @@ static inline u32 lsa_get_etype(struct ospf_lsa_header *h, struct ospf_proto *p) int lsa_flooding_allowed(u32 type, u32 domain, struct ospf_iface *ifa); void lsasum_calculate(struct ospf_lsa_header *header, void *body); -u16 lsasum_check(struct ospf_lsa_header *h, void *body); +u16 lsasum_check(struct ospf_lsa_header *h, void *body, int update); #define CMP_NEWER 1 #define CMP_SAME 0 #define CMP_OLDER -1 diff --git a/proto/ospf/lsupd.c b/proto/ospf/lsupd.c index ea32923ad536..9e7dc64b359d 100644 --- a/proto/ospf/lsupd.c +++ b/proto/ospf/lsupd.c @@ -531,7 +531,7 @@ ospf_receive_lsupd(struct ospf_packet *pkt, struct ospf_iface *ifa, lsa_type, lsa.id, lsa.rt, lsa.sn, lsa.age, lsa.checksum); /* RFC 2328 13. (1) - validate LSA checksum */ - if (lsa_n->checksum != lsasum_check(lsa_n, NULL)) + if (lsa_n->checksum == 0 || lsasum_check(lsa_n, NULL, 0) != 0) SKIP("invalid checksum"); /* RFC 2328 13. (2) */ -- 2.1.4
On Sun, Apr 26, 2015 at 05:10:11PM +0100, Chris Boot wrote:
Prior to this patch, BIRD validates the OSPF LSA checksum by calculating a new checksum and comparing it with the checksum in the header. Due to the specifics of the Fletcher checksum used in OSPF, this is not necessarily correct as the checkbytes in the header may be calculated via a different means and end up with a different value that is nonetheless still correct.
The documented means of validating the checksum as specified in RFC 905 B.4 is to calculate c0 and c1 from the unchanged contents of the packet, which must result in a zero value to be considered valid.
Signed-off-by: Chris Boot <bootc@bootc.net>
Thanks, merged. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Chris Boot -
Ondrej Zajicek