LSA ID collision issue is seen with OSPF stub-networks configured and BIRD is acting as ABR
Hi, *Issue Description*: LSA ID collision issue is seen with OSPF *stub-networks* configured and BIRD is acting as ABR *BIRD OSPF configuration*: · Part of 2 areas (say, area-0 and area-100) · Stub-networks are configured in area-0 *Code Flow*: 1. Stubnets are advertised to area-0 by generating router LSA ( LSA_T_RT). o ospf_disp() -> ospf_update_topology() -> ospf_originate_rt_lsa()->ospf_ originate_lsa() 2. Now, stubnets are added to top graph table (p->gr) with LSA type LSA_T_RT 3. When creating routing table, these stubnets are added to FIB ( p->rtf). o ospf_disp() -> ospf_rt_spf() -> ospf_rt_spfa() -> spfa_process_rt() -> add_network() -> ri_install_net() 4. When BIRD is acting as ABR then, walk through FIB(p->rtf) and send summary LSA (LSA_T_SUM_NET) with LSA mode as LSA_M_RTCALC. Also, nf pointer is set (stores fib node address) in top_hash_entry. o ospf_disp() -> ospf_rt_abr2() -> check_sum_net_lsa() -> ospf_originate_sum_net_lsa() ->ospf_originate_lsa() 5. Now, stubnets are added to top graph table (p->gr) with LSA type LSA_T_SUM_NET 6. Stubnets are removed from FIB(p->rtf) o ospf_disp() -> rt_sync() -> fib_delete() 7. Now, stubnets entries are still present in top graph table (Note: fib_delte() doesn't free the node, it just moves the fib node to free pool, fib_node pointer is still valid) 8. With any change in network, ospf_rt_spf()is called. In ospf_rt_reset(), LSA mode is updated fromLSA_M_RTCALC to LSA_M_STALE. 9. Again, steps 3-6 are repeated. In step-3, fib node pointer is changed and in step-4, fib node pointer comparison fails in ospf_originate_lsa()which is leading to LSA id collision. *Issue is resolved with below code change. *Please review the change and provide your comments. Also, please let me know if any other information is required. diff -rupN bird/proto/ospf/topology.c bird_modified/proto/ospf/topology.c --- bird/proto/ospf/topology.c 2016-11-28 05:14:19.582761752 -0800 +++ bird_modified/proto/ospf/topology.c 2016-11-28 05:14:01.270761752 -0800 @@ -278,7 +278,7 @@ ospf_originate_lsa(struct ospf_proto *p, if (!SNODE_VALID(en)) s_add_tail(&p->lsal, SNODE en); - if (!en->nf || !en->lsa_body) + if (!en->nf || !en->lsa_body || en->mode == LSA_M_STALE) en->nf = lsa->nf; Thanks, Naveen
On Wed, Nov 30, 2016 at 10:35:45AM +0530, naveen chowdary Yerramneni wrote:
Hi,
*Issue Description*: LSA ID collision issue is seen with OSPF *stub-networks* configured and BIRD is acting as ABR
Hi, thanks for the bugreport and analysis. My comments and questions are below.
*Code Flow*: 1. Stubnets are advertised to area-0 by generating router LSA ( LSA_T_RT). o ospf_disp() -> ospf_update_topology() -> ospf_originate_rt_lsa()->ospf_ originate_lsa() 2. Now, stubnets are added to top graph table (p->gr) with LSA type LSA_T_RT 3. When creating routing table, these stubnets are added to FIB ( p->rtf). o ospf_disp() -> ospf_rt_spf() -> ospf_rt_spfa() -> spfa_process_rt() -> add_network() -> ri_install_net() 4. When BIRD is acting as ABR then, walk through FIB(p->rtf) and send summary LSA (LSA_T_SUM_NET) with LSA mode as LSA_M_RTCALC. Also, nf pointer is set (stores fib node address) in top_hash_entry. o ospf_disp() -> ospf_rt_abr2() -> check_sum_net_lsa() -> ospf_originate_sum_net_lsa() ->ospf_originate_lsa() 5. Now, stubnets are added to top graph table (p->gr) with LSA type LSA_T_SUM_NET
Until this it looks OK.
6. Stubnets are removed from FIB(p->rtf) o ospf_disp() -> rt_sync() -> fib_delete()
This seems strange. If these networs are part of the topology, they should be in FIB(p->rtf). You mean that appropriate record in FIB is removed automatically as a conseqyence of this code seqence, or is there any external change (like stubnet removal) that causes it to be removed?
7. Now, stubnets entries are still present in top graph table (Note: fib_delte() doesn't free the node, it just moves the fib node to free pool, fib_node pointer is still valid)
That looks like a dangerous bug. Although fib_delete just moves the fib node back to slab pool, the pointer is considered freed and invalid (the slab pool is just optimization).
8. With any change in network, ospf_rt_spf()is called. In ospf_rt_reset(), LSA mode is updated fromLSA_M_RTCALC to LSA_M_STALE. 9. Again, steps 3-6 are repeated. In step-3, fib node pointer is changed and in step-4, fib node pointer comparison fails in ospf_originate_lsa()which is leading to LSA id collision.
*Issue is resolved with below code change. *Please review the change and provide your comments. Also, please let me know if any other information is required.
Well, it seems to me that the real cause of the bug is in step 6. BTW, your stubnets are explicitly configured ones or based on prefixes of stub interfaces? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Regd #6, all stubnetworks are explicitly removed from FIB in rt_sync(), not due to any external event. Pasted the code sniplet below for reference. I configured stub-networks in the OSPF config using "stubnet" option. static void rt_sync(struct ospf_proto *p) { struct top_hash_entry *en; struct fib_iterator fit; struct fib *fib = &p->rtf; ort *nf; struct ospf_area *oa; /* This is used for forced reload of routes */ int reload = (p->calcrt == 2); OSPF_TRACE(D_EVENTS, "Starting routing table synchronisation"); DBG("Now syncing my rt table with nest's\n"); FIB_ITERATE_INIT(&fit, fib); again1: FIB_ITERATE_START(fib, &fit, nftmp) { nf = (ort *) nftmp; /* Sanity check of next-hop addresses, failure should not happen */ if (nf->n.type) { struct mpnh *nh; for (nh = nf->n.nhs; nh; nh = nh->next) if (ipa_nonzero(nh->gw)) { neighbor *ng = neigh_find2(&p->p, &nh->gw, nh->iface, 0); if (!ng || (ng->scope == SCOPE_HOST)) { reset_ri(nf); break; } } } /* Remove configured stubnets */ if (!nf->n.nhs) reset_ri(nf); . . . . . /* Remove unused rt entry, some special entries are persistent */ if (!nf->n.type && !nf->external_rte && !nf->area_net) { FIB_ITERATE_PUT(&fit, nftmp); fib_delete(fib, nftmp); goto again1; } } FIB_ITERATE_END(nftmp); . . . } On Wed, Nov 30, 2016 at 6:01 PM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Wed, Nov 30, 2016 at 10:35:45AM +0530, naveen chowdary Yerramneni wrote:
Hi,
*Issue Description*: LSA ID collision issue is seen with OSPF *stub-networks* configured and BIRD is acting as ABR
Hi, thanks for the bugreport and analysis. My comments and questions are below.
*Code Flow*: 1. Stubnets are advertised to area-0 by generating router LSA ( LSA_T_RT). o ospf_disp() -> ospf_update_topology() -> ospf_originate_rt_lsa()->ospf_ originate_lsa() 2. Now, stubnets are added to top graph table (p->gr) with LSA type LSA_T_RT 3. When creating routing table, these stubnets are added to FIB ( p->rtf). o ospf_disp() -> ospf_rt_spf() -> ospf_rt_spfa() -> spfa_process_rt() -> add_network() -> ri_install_net() 4. When BIRD is acting as ABR then, walk through FIB(p->rtf) and send summary LSA (LSA_T_SUM_NET) with LSA mode as LSA_M_RTCALC. Also, nf pointer is set (stores fib node address) in top_hash_entry. o ospf_disp() -> ospf_rt_abr2() -> check_sum_net_lsa() -> ospf_originate_sum_net_lsa() ->ospf_originate_lsa() 5. Now, stubnets are added to top graph table (p->gr) with LSA type LSA_T_SUM_NET
Until this it looks OK.
6. Stubnets are removed from FIB(p->rtf) o ospf_disp() -> rt_sync() -> fib_delete()
This seems strange. If these networs are part of the topology, they should be in FIB(p->rtf). You mean that appropriate record in FIB is removed automatically as a conseqyence of this code seqence, or is there any external change (like stubnet removal) that causes it to be removed?
7. Now, stubnets entries are still present in top graph table (Note: fib_delte() doesn't free the node, it just moves the fib node to free pool, fib_node pointer is still valid)
That looks like a dangerous bug. Although fib_delete just moves the fib node back to slab pool, the pointer is considered freed and invalid (the slab pool is just optimization).
8. With any change in network, ospf_rt_spf()is called. In ospf_rt_reset(), LSA mode is updated fromLSA_M_RTCALC to LSA_M_STALE. 9. Again, steps 3-6 are repeated. In step-3, fib node pointer is changed and in step-4, fib node pointer comparison fails in ospf_originate_lsa()which is leading to LSA id collision.
*Issue is resolved with below code change. *Please review the change and provide your comments. Also, please let me know if any other information is required.
Well, it seems to me that the real cause of the bug is in step 6.
BTW, your stubnets are explicitly configured ones or based on prefixes of stub interfaces?
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, Nov 30, 2016 at 08:25:32PM +0530, naveen chowdary Yerramneni wrote:
Regd #6, all stubnetworks are explicitly removed from FIB in rt_sync(), not due to any external event. Pasted the code sniplet below for reference.
Hi The attached patch should fix the bug. The FIB entry is not removed in this case so the FIB node remains valid and is reused in the next recalculation. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
naveen chowdary Yerramneni -
Ondrej Zajicek