Hello list! Currently bird performs LSA premature aging in very strange way which sometimes upsets quagga (up to SIGSEGV) and Cisco. Every time aging is done via setting seq number to LSA_MAXSEQNO. RFC 2328, on the opposite, shows us only 2 cases where MaxSequenceNumber is used in outgoing LSAs: 1) Self LSA originating with given seq num (very rare event) 2) [Premature] Aging of LSA from 1) because a) disappeared net and b) seq number increase (12.1.6) For all other cases seq number should be either increased or left intact. Bird, however, insist on setting LSA_MAXSEQNO while: 1) Receiving self-originating LSA which is not in local LSDB ( ospf_lsupd_receive ) which is not a special case according to 14.1 2) In all other places like handling if-down, external route disappear, etc... ( ospf_lsupd_flush_nlsa ) IT takes noticeable time (sometimes up to 5 seconds) for Cisco 65XX with ~40 neighbors to age given LSA (however it seems that problem is repeatable for ASR9k, too). And while such LSA is in database, every new update is ignored due to 13 step (8). (I've got full logs and pcap for given situation). Can we please consider applying attached patch to make it more compliant?
On Wed, Aug 14, 2013 at 03:35:07AM +0400, Alexander V. Chernikov wrote:
Hello list!
Currently bird performs LSA premature aging in very strange way which sometimes upsets quagga (up to SIGSEGV) and Cisco. Every time aging is done via setting seq number to LSA_MAXSEQNO.
RFC 2328, on the opposite, shows us only 2 cases where MaxSequenceNumber is used in outgoing LSAs: 1) Self LSA originating with given seq num (very rare event) 2) [Premature] Aging of LSA from 1) because a) disappeared net and b) seq number increase (12.1.6)
For all other cases seq number should be either increased or left intact.
Bird, however, insist on setting LSA_MAXSEQNO while: 1) Receiving self-originating LSA which is not in local LSDB ( ospf_lsupd_receive ) which is not a special case according to 14.1 2) In all other places like handling if-down, external route disappear, etc... ( ospf_lsupd_flush_nlsa )
You are right that BIRD uses LSA_MAXSEQNO in more cases than prescribed by RFC 2328 (essentially in most cases when a local LSA is flushed), but these changes were introduced mostly as a reaction to problems with the usual flushing (just premature aging), where BIRD sends LSA with MaxAge, then forgets the LSA and when reintroduced the LSA later, it starts with InitSeqNum. In cases like external route flaps (and similar events for other kinds of LSAs) that caused that newer LSA meets MaxAge LSA during flood, is considered as older and removed (and combined with other minor inconsistencies in router behaviors this would lead to all kinds of strange events). The idea behind the behavior is that you should either keep the old seqnum for your LSA when it is flushed (and continue in it when you reintroduce it), or you should use LSA_MAXSEQNO to ensure that old seqnum is eliminated from the OSPF domain so you could forget seqnum and later start from InitSeqNum. The behavior is not according to RFC 2328, but it is compatible with it. And i have a work-in-progress code that fixes several issues in the flooding code together with this. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 14.08.2013 14:18, Ondrej Zajicek wrote:
On Wed, Aug 14, 2013 at 03:35:07AM +0400, Alexander V. Chernikov wrote:
Hello list!
Currently bird performs LSA premature aging in very strange way which sometimes upsets quagga (up to SIGSEGV) and Cisco. Every time aging is done via setting seq number to LSA_MAXSEQNO.
RFC 2328, on the opposite, shows us only 2 cases where MaxSequenceNumber is used in outgoing LSAs: 1) Self LSA originating with given seq num (very rare event) 2) [Premature] Aging of LSA from 1) because a) disappeared net and b) seq number increase (12.1.6)
For all other cases seq number should be either increased or left intact.
Bird, however, insist on setting LSA_MAXSEQNO while: 1) Receiving self-originating LSA which is not in local LSDB ( ospf_lsupd_receive ) which is not a special case according to 14.1 2) In all other places like handling if-down, external route disappear, etc... ( ospf_lsupd_flush_nlsa ) You are right that BIRD uses LSA_MAXSEQNO in more cases than prescribed by RFC 2328 (essentially in most cases when a local LSA is flushed), but these changes were introduced mostly as a reaction to problems with the usual flushing (just premature aging), where BIRD sends LSA with MaxAge, then forgets the LSA and when reintroduced the LSA later, it starts with InitSeqNum. In cases like external route flaps (and similar events for other kinds of LSAs) that caused that newer LSA meets MaxAge LSA during flood, is considered as older and removed (and combined with other Yes, but shouldn't "new" MaxAge than be sent back to neighbor and than back to other neighbors reaching originating router? (And after that it should generate seq+1 LSA and push it back finishing this process) so it sounds a bit more deterministic.
minor inconsistencies in router behaviors this would lead to all kinds of strange events).
The idea behind the behavior is that you should either keep the old seqnum for your LSA when it is flushed (and continue in it when you reintroduce it), or you should use LSA_MAXSEQNO to ensure that old seqnum is eliminated from the OSPF domain so you could forget seqnum and later start from InitSeqNum. The behavior is not according to RFC 2328, but it is compatible with it. And i have a work-in-progress code that fixes several issues in the flooding code together with this. That sounds great. Do you plan to implement keeping sequence numbers for all seen routes?
On 14.08.2013 14:18, Ondrej Zajicek wrote:
On Wed, Aug 14, 2013 at 03:35:07AM +0400, Alexander V. Chernikov wrote:
Hello list!
Currently bird performs LSA premature aging in very strange way which sometimes upsets quagga (up to SIGSEGV) and Cisco. Every time aging is done via setting seq number to LSA_MAXSEQNO.
RFC 2328, on the opposite, shows us only 2 cases where MaxSequenceNumber is used in outgoing LSAs: 1) Self LSA originating with given seq num (very rare event) 2) [Premature] Aging of LSA from 1) because a) disappeared net and b) seq number increase (12.1.6)
For all other cases seq number should be either increased or left intact.
Bird, however, insist on setting LSA_MAXSEQNO while: 1) Receiving self-originating LSA which is not in local LSDB ( ospf_lsupd_receive ) which is not a special case according to 14.1 2) In all other places like handling if-down, external route disappear, etc... ( ospf_lsupd_flush_nlsa ) You are right that BIRD uses LSA_MAXSEQNO in more cases than prescribed by RFC 2328 (essentially in most cases when a local LSA is flushed), but these changes were introduced mostly as a reaction to problems with the usual flushing (just premature aging), where BIRD sends LSA with MaxAge, then forgets the LSA and when reintroduced the LSA later, it starts with InitSeqNum. In cases like external route flaps (and similar events for other kinds of LSAs) that caused that newer LSA meets MaxAge LSA during flood, is considered as older and removed (and combined with other minor inconsistencies in router behaviors this would lead to all kinds of strange events). Let me return to this old, but still relevant topic :) Two patches, unifying work with sn/age LSA fields with sn story-keeping stuff attached.
First one replaces most initial LSA header setup with single fill_lsa_header() function. It is also responsible for (unique) LSA id generation. Second one handles sequence number keeping for all LSA types using per-are FIB for most LSAs and per-proto FIB for AS-boundary (type5/7/0x2007/0x4005). I'd prefer to use fib2_init() for lookups, but fib_init() seems to be sufficient, too. We definitely have to convert ipa_to_rid() to some stateful stuff at least in IPv6 case, but that's the next step. We also need to deal more accurately with LSA_MAXSEQ.
The idea behind the behavior is that you should either keep the old seqnum for your LSA when it is flushed (and continue in it when you reintroduce it), or you should use LSA_MAXSEQNO to ensure that old seqnum is eliminated from the OSPF domain so you could forget seqnum and later start from InitSeqNum. The behavior is not according to RFC 2328, but it is compatible with it. And i have a work-in-progress code that fixes several issues in the flooding code together with this.
participants (2)
-
Alexander V. Chernikov -
Ondrej Zajicek