Hello Team, I encountered a weird problem with OSPF. I attache scheme.png. R1 and R2 has configured area0 via vlan1000, both has also area1 NSSA to R3. Simply: ------------------------- config R1: protocol ospf CORE_OSPF { tick 1; ipv4 { import all; export none; }; area 0.0.0.0 { interface "vlan1000" { cost 2; type ptp; bfd; }; }; area 0.0.0.1 { nssa; interface "vlan4001" { type ptp; bfd yes; }; }; }; R1: Bird 2.0.10 ------------------------- config R2: protocol ospf CORE_OSPF { tick 1; ipv4 { import all; export none; }; area 0.0.0.0 { interface "vlan1000" { cost 2; type ptp; bfd; }; }; area 0.0.0.1 { nssa; interface "vlan4011" { type ptp; bfd yes; }; }; }; R2: Bird 2.0.11 ------------------------- config R3: function allow_network() prefix set localnet; { localnet = [ 10.0.0.0/8{24,30}]; ospf_metric1 = 20; if net ~ localnet then return true; else return false; } filter out_connected { if allow_network() then accept; else reject; } protocol ospf CORE_OSPF { tick 1; ipv4 { import all; export filter out_connected; }; area 0.0.0.1 { nssa; interface "vlan4001" { cost 100; type ptp; bfd yes; }; interface "vlan4011" { cost 100; type ptp; bfd yes; }; }; }; conntected interface with 10.7.100.254/24: # ifconfig vlan91 vlan91: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4600703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> ether a0:36:9f:9d:4a:4c inet 10.7.100.254 netmask 0xffffff00 broadcast 10.7.100.255 groups: vlan vlan: 91 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0 media: Ethernet autoselect status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> R3: Bird 2.0.11 ------------------------- In normale state, every looks good. R1 and R2 see connected subnets from R3 as OSPF E1 via local vlan (R1->4001, R2->4011), R3 recevied default route (nssa): from R1: BIRD 2.0.10 ready. bird> show route for 10.7.100.254 Table master4: 10.7.100.0/24 unicast [CORE_OSPF 23:08:09.064] * E1 (150/30) [xx.xx.xx.xx] via xx.xx.92.141 on vlan4001 bird> from R2: BIRD 2.0.11 ready. bird> show route for 10.7.100.254 Table master4: 10.7.100.0/24 unicast [CORE_OSPF 23:21:35.334] * E1 (150/30) [xx.xx.xx.xx] via xx.xx.92.137 on vlan4011 bird> but when on the switch remove e.g. vlan4001 to R3 (broken L2 connectivity), R1 still see via vlan4001 (all dead times have expired - configured bfd). When I set ifconfig vlan4001 down on R3 the announcement disappears, R1 and R2 to do not see 10.7.100.0/24. if in this state (vlan4001 - broken connectivity, R3: vlan4001 state down), I restart bird process the situation is getting better - R2 see 10.7.100.254 via vlan4011 and R1 see 10.7.100.254 via vlan1000 (area0). I rewrite R3 config from Bird to Quagga and there is no such things. I have also R4 (mikrotik) connected in a similar way (AREA2 NSSA) and also works fine when I emulate L2 connectivity interrupt. I can provide all the logs from the devices, All nodes are FreeBSD. Regards, Konrad Kręciwilk
On Thu, Jan 19, 2023 at 11:45:01PM +0100, Konrad Kręciwilk via Bird-users wrote:
Hello Team,
Hello
but when on the switch remove e.g. vlan4001 to R3 (broken L2 connectivity), R1 still see via vlan4001 (all dead times have expired - configured bfd).
So even if OSPF neighbor is down/removed, the E1 route is directed to the vlan4001 iface?
When I set ifconfig vlan4001 down on R3 the announcement disappears, R1 and R2 to do not see 10.7.100.0/24. if in this state (vlan4001 - broken connectivity, R3: vlan4001 state down), I restart bird process the situation is getting better - R2 see 10.7.100.254 via vlan4011 and R1 see 10.7.100.254 via vlan1000 (area0).
That might be a bug in translation from NSSA-LSA to Ext-LSA. Could you try if some regular prefix on R3 (e.g. stub network on another iface, not external route exported to OSPF) behave correctly, or behave similarly to the 10.7.100.0/24 prefix? Could you send 'show ospf state' 'show ospf neighbors' and 'show route' on R1 / R2 / R3 for each of these steps (initial, after vlan4001 removal, after ifconfig, after restart R3)? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
W dniu 2023-01-21 01:01, Ondrej Zajicek napisał(a):
On Thu, Jan 19, 2023 at 11:45:01PM +0100, Konrad Kręciwilk via Bird-users wrote:
Hello Team,
Hello
but when on the switch remove e.g. vlan4001 to R3 (broken L2 connectivity), R1 still see via vlan4001 (all dead times have expired - configured bfd).
So even if OSPF neighbor is down/removed, the E1 route is directed to the vlan4001 iface?
When I set ifconfig vlan4001 down on R3 the announcement disappears, R1 and R2 to do not see 10.7.100.0/24. if in this state (vlan4001 - broken connectivity, R3: vlan4001 state down), I restart bird process the situation is getting better - R2 see 10.7.100.254 via vlan4011 and R1 see 10.7.100.254 via vlan1000 (area0).
If I change export filter to none and add interface vlan91 as stub (which has 10.7.100.254) as below: protocol ospf CORE_OSPF { tick 1; ipv4 { import all; export none; }; area 0.0.0.1 { nssa; interface "vlan4001" { cost 100; type ptp; bfd yes; }; interface "vlan4011" { cost 100; type ptp; bfd yes; }; #vlan91 with 10.7.100.254/24 interface "vlan91" { stub; }; }; }; everything works well while stimulating an L2 connection interruption. I did outputs for you when I had configured translation from NSSA-LSA to Ext-LSA: normale - every vlans works broken - broken L2 connectivity (vlan4001) from R1 to R3 As you can see show ospf state from R1 has extrenal 10.7.100.0/24 via 212.127.92.29 (which is local link via vlan4001) which is interrupted (L2). Its look like R3 does not update database (via) when neighbor is lost (vlan4001) area 0.0.0.0 router 212.127.92.1 distance 2 router 212.127.92.2 metric 2 stubnet 212.127.92.0/30 metric 2 xnetwork 212.127.92.28/30 metric 110 xnetwork 212.127.92.128/30 metric 10 external 10.7.100.0/24 metric 20 via 212.127.92.29 when I do on R3 ifconfig vlan4001 down ifconfig vlan91 down ifconfig vlan91 up database is refreshed and show ospf state looks: area 0.0.0.0 router 212.127.92.1 distance 2 router 212.127.92.2 metric 2 stubnet 212.127.92.0/30 metric 2 xnetwork 212.127.92.128/30 metric 10 external 10.7.100.0/24 metric 20 via 212.127.92.129 -> it is vlan4011 from R2 to R3 and 10.7.100.0/24 from R1 is visable via R2 vlan1000 and then via vlan4011. I forget did outputs when I was doing down/up on interfaces. I you need, I will do. Regards, Konrad Kręciwilk
That might be a bug in translation from NSSA-LSA to Ext-LSA.
Could you try if some regular prefix on R3 (e.g. stub network on another iface, not external route exported to OSPF) behave correctly, or behave similarly to the 10.7.100.0/24 prefix?
Could you send 'show ospf state' 'show ospf neighbors' and 'show route' on R1 / R2 / R3 for each of these steps (initial, after vlan4001 removal, after ifconfig, after restart R3)?
On Sat, Jan 21, 2023 at 04:18:47PM +0100, Konrad Kręciwilk wrote:
As you can see show ospf state from R1 has extrenal 10.7.100.0/24 via 212.127.92.29 (which is local link via vlan4001) which is interrupted (L2). Its look like R3 does not update database (via) when neighbor is lost (vlan4001)
area 0.0.0.0
router 212.127.92.1 distance 2 router 212.127.92.2 metric 2 stubnet 212.127.92.0/30 metric 2 xnetwork 212.127.92.28/30 metric 110 xnetwork 212.127.92.128/30 metric 10 external 10.7.100.0/24 metric 20 via 212.127.92.29
I think i understand the issue. LSSA-LSA must contain forwarding address. The route exported to OSPF on R3 was a direct route, so it does not have one. BIRD has to choose one from interfaces that are part of OSPF domain, i.e. "vlan4001" (212.127.92.29) and "vlan4011" (212.127.92.129). It chose the first one, and announced NSSA-LSA with that IP address. When link R1-R3 broke, there is no need to choose a different one, as 212.127.92.29 is still valid IP for R3. Now R1 sees Ext-LSA with forwarding address 212.127.92.29 (translated by R2 from NSSA-LSA), and it considers 212.127.92.29 reachable (due to local network 212.127.92.28/30 on vlan4011). That is kind of blind spot for OSPF - when a stub network is announced, all addresses on that network are considered reachable, even if the network is really splitted. If the iface vlan4001 on R3 is disabled, R3 has to announce NSSA-LSA with a different forwarding address, so it will work eventually. But that is not a real issue - if the iface is up, its IP address is valid to be used as forwarding address. I see two ways how to fix it: 1) Configuration fix - you should have some loopback/stub IP (with /32 mask) on R3 in OSPF domain. In that case BIRD would prefer such address as forwarding address for originated NSSA-LSAs. 2) I think that OSPF routers should annouce all their local addresses as /32 (in addition to real prefixes), that would mitigate the blind spot. Or at least ones that are used as forwarding addresses in LSAs. If R3 announced 212.127.92.129/32 stub, then R2 would translate it to backbone and R1 would use path to 212.127.92.129/32 via R2, even if it has direct 212.127.92.28/30. Just curious, how is this situation solved by Quagga and Mikrotik, if you bring it to the similar situation, what is output of 'show ospf state' on R1/R2? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
W dniu 2023-01-21 18:23, Ondrej Zajicek napisał(a):
On Sat, Jan 21, 2023 at 04:18:47PM +0100, Konrad Kręciwilk wrote:
As you can see show ospf state from R1 has extrenal 10.7.100.0/24 via 212.127.92.29 (which is local link via vlan4001) which is interrupted (L2). Its look like R3 does not update database (via) when neighbor is lost (vlan4001)
area 0.0.0.0
router 212.127.92.1 distance 2 router 212.127.92.2 metric 2 stubnet 212.127.92.0/30 metric 2 xnetwork 212.127.92.28/30 metric 110 xnetwork 212.127.92.128/30 metric 10 external 10.7.100.0/24 metric 20 via 212.127.92.29
I think i understand the issue. LSSA-LSA must contain forwarding address. The route exported to OSPF on R3 was a direct route, so it does not have one. BIRD has to choose one from interfaces that are part of OSPF domain, i.e. "vlan4001" (212.127.92.29) and "vlan4011" (212.127.92.129).
It chose the first one, and announced NSSA-LSA with that IP address. When link R1-R3 broke, there is no need to choose a different one, as 212.127.92.29 is still valid IP for R3.
Now R1 sees Ext-LSA with forwarding address 212.127.92.29 (translated by R2 from NSSA-LSA), and it considers 212.127.92.29 reachable (due to local network 212.127.92.28/30 on vlan4011). That is kind of blind spot for OSPF - when a stub network is announced, all addresses on that network are considered reachable, even if the network is really splitted.
If the iface vlan4001 on R3 is disabled, R3 has to announce NSSA-LSA with a different forwarding address, so it will work eventually. But that is not a real issue - if the iface is up, its IP address is valid to be used as forwarding address.
I see two ways how to fix it:
1) Configuration fix - you should have some loopback/stub IP (with /32 mask) on R3 in OSPF domain. In that case BIRD would prefer such address as forwarding address for originated NSSA-LSAs.
I added interface with /32 as a stub, now /32 becomes a forwarding address for originated NSSA-LSAs. Its works good now Thank you !
2) I think that OSPF routers should annouce all their local addresses as /32 (in addition to real prefixes), that would mitigate the blind spot. Or at least ones that are used as forwarding addresses in LSAs. If R3 announced 212.127.92.129/32 stub, then R2 would translate it to backbone and R1 would use path to 212.127.92.129/32 via R2, even if it has direct 212.127.92.28/30.
Just curious, how is this situation solved by Quagga and Mikrotik, if you bring it to the similar situation, what is output of 'show ospf state' on R1/R2?
Sorry it was my mistake. Quagga/Frr/Mikrotik they behave the same way.
W dniu 22.01.2023 o 20:59, Konrad Kręciwilk via Bird-users pisze:
W dniu 2023-01-21 18:23, Ondrej Zajicek napisał(a):
On Sat, Jan 21, 2023 at 04:18:47PM +0100, Konrad Kręciwilk wrote:
As you can see show ospf state from R1 has extrenal 10.7.100.0/24 via 212.127.92.29 (which is local link via vlan4001) which is interrupted (L2). Its look like R3 does not update database (via) when neighbor is lost (vlan4001)
area 0.0.0.0
router 212.127.92.1 distance 2 router 212.127.92.2 metric 2 stubnet 212.127.92.0/30 metric 2 xnetwork 212.127.92.28/30 metric 110 xnetwork 212.127.92.128/30 metric 10 external 10.7.100.0/24 metric 20 via 212.127.92.29
I think i understand the issue. LSSA-LSA must contain forwarding address. The route exported to OSPF on R3 was a direct route, so it does not have one. BIRD has to choose one from interfaces that are part of OSPF domain, i.e. "vlan4001" (212.127.92.29) and "vlan4011" (212.127.92.129).
It chose the first one, and announced NSSA-LSA with that IP address. When link R1-R3 broke, there is no need to choose a different one, as 212.127.92.29 is still valid IP for R3.
Now R1 sees Ext-LSA with forwarding address 212.127.92.29 (translated by R2 from NSSA-LSA), and it considers 212.127.92.29 reachable (due to local network 212.127.92.28/30 on vlan4011). That is kind of blind spot for OSPF - when a stub network is announced, all addresses on that network are considered reachable, even if the network is really splitted.
If the iface vlan4001 on R3 is disabled, R3 has to announce NSSA-LSA with a different forwarding address, so it will work eventually. But that is not a real issue - if the iface is up, its IP address is valid to be used as forwarding address.
I see two ways how to fix it:
1) Configuration fix - you should have some loopback/stub IP (with /32 mask) on R3 in OSPF domain. In that case BIRD would prefer such address as forwarding address for originated NSSA-LSAs.
I added interface with /32 as a stub, now /32 becomes a forwarding address for originated NSSA-LSAs. Its works good now Thank you !
2) I think that OSPF routers should annouce all their local addresses as /32 (in addition to real prefixes), that would mitigate the blind spot. Or at least ones that are used as forwarding addresses in LSAs. If R3 announced 212.127.92.129/32 stub, then R2 would translate it to backbone and R1 would use path to 212.127.92.129/32 via R2, even if it has direct 212.127.92.28/30.
Just curious, how is this situation solved by Quagga and Mikrotik, if you bring it to the similar situation, what is output of 'show ospf state' on R1/R2?
Sorry it was my mistake. Quagga/Frr/Mikrotik they behave the same way.
additional info, Mikrotik not always elect /32 stub as forwarding-address. I dont know why for one solution /32 is forwarding-address but for another not. The choice is not obvious but since v7.X is possible to set forwarding-address using filters (set ospf-ext-fwd): rul: if (protocol connected && dst in OSPF-ANNOUNCE) { set ospf-ext-type type1; set ospf-ext-fwd 10.81.254.3; accept } and then it works predictably :) -- Pozdrawiam, Konrad Kręciwilk Inżynier sieci GSM +48 883 131 165 tel.: +48 71 735 15 31 e-mail:konrad.kreciwilk@korbank.pl
participants (2)
-
Konrad Kręciwilk -
Ondrej Zajicek