OSPF: incorrect path computation for v2.0.5+?
Upgraded from 2.0.4 to 2.0.7 and observed strange OSPF path computation. Re-tested against v2.0.5, same result. Any known issues in this area? Setup when observing issue; 5 nodes connected in a ring with PPP links (some redundant links). /----------------------------------------------------------------\ ,------------. ,------------. ,------------. ,------------. ,------------. |10.210.139.1| |10.210.139.2| |10.210.139.3| |10.210.139.4| |10.210.139.5| | |---| |---| |---| |---| | | |---| |---| | | | | | `------------' `------------' `------------' `------------' `------------' With 2.0.4 node 10.210.139.1 finds shortest path to 10.210.139.5 via directly connected interface. With 2.0.5+ the OSPF computation results via hop-by-hop thru 10.210.139.2. Result with 2.0.4 ----------------- node_1 ~ # birdc BIRD v2.0.4-16-g1528cec5 ready. bird> show ospf interface ospfv2_1 ospfv2_1: Interface p1-1-3-1-4 (peer 10.210.139.5) Type: ptp Area: 0.0.0.0 (0) State: PtP Priority: 0 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Interface p1-1-1-1-2 (peer 10.210.139.2) Type: ptp Area: 0.0.0.0 (0) State: PtP Priority: 0 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Interface p1-1-5-1-6 (peer 10.210.139.2) Type: ptp Area: 0.0.0.0 (0) State: PtP Priority: 0 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 bird> show ospf topology ospfv2_1 area 0.0.0.0 router 0.0.139.1 distance 0 router 0.0.139.5 metric 10 router 0.0.139.2 metric 10 router 0.0.139.2 metric 10 router 0.0.139.2 distance 10 router 0.0.139.3 metric 10 router 0.0.139.3 metric 10 router 0.0.139.1 metric 10 router 0.0.139.1 metric 10 router 0.0.139.3 distance 20 router 0.0.139.4 metric 10 router 0.0.139.2 metric 10 router 0.0.139.2 metric 10 router 0.0.139.4 distance 20 router 0.0.139.3 metric 10 router 0.0.139.5 metric 10 router 0.0.139.5 distance 10 router 0.0.139.4 metric 10 router 0.0.139.1 metric 10 bird> show route table master4 Table master4: 0.0.0.0/0 unicast [static1 14:11:45.720] * (254) via 10.210.129.1 on eth1 10.210.139.2/32 unicast [direct1 14:15:57.283] * (255) dev p1-1-1-1-2 unicast [ospfv2_1 14:16:12.888] E1 (145/10) [0.0.139.2] via 10.210.139.2 on p1-1-1-1-2 unicast [direct1 14:16:01.143] (255) dev p1-1-5-1-6 10.210.139.5/32 unicast [direct1 14:18:20.315] * (255) dev p1-1-3-1-4 unicast [ospfv2_1 14:18:34.888] E1 (145/10) [0.0.139.5] via 10.210.139.5 on p1-1-3-1-4 10.210.139.1/32 unicast [direct1 14:11:37.896] * (255) dev lo 10.210.139.4/32 unicast [ospfv2_1 14:18:34.888] * E1 (145/20) [0.0.139.4] via 10.210.139.5 on p1-1-3-1-4 10.210.129.0/24 unicast [direct1 14:11:45.720] * (255) dev eth1 10.210.139.3/32 unicast [ospfv2_1 14:17:58.888] * E1 (145/20) [0.0.139.3] via 10.210.139.2 on p1-1-1-1-2 10.0.0.0/20 unicast [direct1 14:11:21.625] * (255) dev eth0 bird> show route Table master4: 0.0.0.0/0 unicast [static1 14:23:07.805] * (254) via 10.210.129.1 on eth1 10.210.139.2/32 unicast [direct1 14:23:07.763] * (255) dev p1-1-1-1-2 unicast [ospfv2_1 14:23:22.813] E1 (145/10) [0.0.139.2] via 10.210.139.2 on p1-1-1-1-2 10.210.139.5/32 unicast [direct1 14:23:08.613] * (255) dev p1-1-3-1-4 unicast [ospfv2_1 14:23:22.813] E1 (145/40) [0.0.139.5] via 10.210.139.2 on p1-1-1-1-2 10.210.139.1/32 unicast [direct1 14:23:02.820] * (255) dev lo 10.210.139.4/32 unicast [ospfv2_1 14:23:22.813] * E1 (145/30) [0.0.139.4] via 10.210.139.2 on p1-1-1-1-2 10.210.129.0/24 unicast [direct1 14:23:07.805] * (255) dev eth1 10.210.139.3/32 unicast [ospfv2_1 14:23:22.813] * E1 (145/20) [0.0.139.3] via 10.210.139.2 on p1-1-1-1-2 10.0.0.0/20 unicast [direct1 14:22:38.675] * (255) dev eth0 Result with 2.0.5 and 2.0.7 --------------------------- node_1 ~ # birdc BIRD v2.0.5-1-gd383d5ba ready. bird> show ospf interface ospfv2_1 ospfv2_1: Interface p1-1-1-1-2 (peer 10.210.139.2) Type: ptp Area: 0.0.0.0 (0) State: PtP Priority: 0 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Interface p1-1-5-1-6 (peer 10.210.139.2) Type: ptp Area: 0.0.0.0 (0) State: PtP Priority: 0 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Interface p1-1-3-1-4 (peer 10.210.139.5) Type: ptp Area: 0.0.0.0 (0) State: PtP Priority: 0 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 bird> bird> show ospf topology ospfv2_1 area 0.0.0.0 router 0.0.139.1 distance 0 router 0.0.139.2 metric 10 router 0.0.139.2 metric 10 router 0.0.139.5 metric 10 router 0.0.139.2 distance 10 router 0.0.139.3 metric 10 router 0.0.139.3 metric 10 router 0.0.139.1 metric 10 router 0.0.139.1 metric 10 router 0.0.139.3 distance 20 router 0.0.139.4 metric 10 router 0.0.139.2 metric 10 router 0.0.139.2 metric 10 router 0.0.139.4 distance 30 router 0.0.139.3 metric 10 router 0.0.139.5 metric 10 router 0.0.139.5 distance 40 router 0.0.139.4 metric 10 router 0.0.139.1 metric 10 bird> bird> show route table master4 Table master4: 0.0.0.0/0 unicast [static1 14:41:12.689] * (254) via 10.210.129.1 on eth1 10.210.139.2/32 unicast [direct1 14:41:06.967] * (255) dev p1-1-1-1-2 unicast [ospfv2_1 14:41:22.924] E1 (145/10) [0.0.139.2] via 10.210.139.2 on p1-1-1-1-2 unicast [direct1 14:41:08.739] (255) dev p1-1-5-1-6 10.210.139.5/32 unicast [direct1 14:41:41.577] * (255) dev p1-1-3-1-4 unicast [ospfv2_1 14:41:22.924] E1 (145/40) [0.0.139.5] via 10.210.139.2 on p1-1-1-1-2 10.210.139.1/32 unicast [direct1 14:41:04.950] * (255) dev lo 10.210.139.4/32 unicast [ospfv2_1 14:41:22.924] E1 (145/30) [0.0.139.4] via 10.210.139.2 on p1-1-1-1-2 10.210.129.0/24 unicast [direct1 14:41:12.689] * (255) dev eth1 10.210.139.3/32 unicast [ospfv2_1 14:41:22.924] E1 (145/20) [0.0.139.3] via 10.210.139.2 on p1-1-1-1-2 10.0.0.0/20 unicast [direct1 14:40:48.586] * (255) dev eth0 bird> bird> show ospf neighbors ospfv2_1 ospfv2_1: Router ID Pri State DTime Interface Router IP 0.0.139.2 0 Full/PtP 38.373 p1-1-1-1-2 10.210.139.2 0.0.139.2 0 Full/PtP 30.144 p1-1-5-1-6 10.210.139.2 0.0.139.5 0 Full/PtP 32.983 p1-1-3-1-4 10.210.139.5 bird>
On Wed, May 20, 2020 at 02:47:58PM +0000, Kenth Eriksson wrote:
Upgraded from 2.0.4 to 2.0.7 and observed strange OSPF path computation. Re-tested against v2.0.5, same result. Any known issues in this area?
Hi It is likely got broken by commit introduced OSPF graceful restart (commit 1a2ad348f660b150265f6df759a07de8a2b6de2f). Could you verify that by checking before and after that? There is a tricky part in OSPF code that is responsible link-back calculation that had to be tweaked for OSPF-GR, so perhaps it failed for link between 10.210.139.1 and 10.210.139.5, so it was not used for routing table computation. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, 2020-05-20 at 18:14 +0200, Ondrej Zajicek wrote:
Hi
It is likely got broken by commit introduced OSPF graceful restart (commit 1a2ad348f660b150265f6df759a07de8a2b6de2f). Could you verify that by checking before and after that?
Will do, but unfortunately OSPF GR is not a single commit that can be reverted...
On Wed, May 20, 2020 at 08:51:20PM +0000, Kenth Eriksson wrote:
On Wed, 2020-05-20 at 18:14 +0200, Ondrej Zajicek wrote:
Hi
It is likely got broken by commit introduced OSPF graceful restart (commit 1a2ad348f660b150265f6df759a07de8a2b6de2f). Could you verify that by checking before and after that?
Will do, but unfortunately OSPF GR is not a single commit that can be reverted...
You do not need to revert it, you can just try the commit above and the previous one (8a68316eb96be1fecf91ca395f3321aa99997ad2). -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, May 20, 2020 at 02:47:58PM +0000, Kenth Eriksson wrote:
Upgraded from 2.0.4 to 2.0.7 and observed strange OSPF path computation. Re-tested against v2.0.5, same result. Any known issues in this area?
Also, are all of these 2.0.5-7, or is it broken only in heterogenous setting?
Setup when observing issue; 5 nodes connected in a ring with PPP links (some redundant links).
/----------------------------------------------------------------\ ,------------. ,------------. ,------------. ,------------. ,------------. |10.210.139.1| |10.210.139.2| |10.210.139.3| |10.210.139.4| |10.210.139.5| | |---| |---| |---| |---| | | |---| |---| | | | | | `------------' `------------' `------------' `------------' `------------'
-- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, 2020-05-20 at 18:23 +0200, Ondrej Zajicek wrote:
Also, are all of these 2.0.5-7, or is it broken only in heterogenous setting?
Tested with all 5 nodes running 2.0.7. Also tested with only one of them running 2.0.5 and 2.0.7. So not caused by compatibility issues between 2.0.4 and newer version.
On Wed, May 20, 2020 at 02:47:58PM +0000, Kenth Eriksson wrote:
Upgraded from 2.0.4 to 2.0.7 and observed strange OSPF path computation. Re-tested against v2.0.5, same result. Any known issues in this area?
This patch should fix the issue, could you try it? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing. We definitely need this fix in the pending 2.0.8 :-) node_1 ~ # birdc BIRD v2.0.7-11-g16b30256 ready. bird> show route table master4 Table master4: 0.0.0.0/0 unicast [static1 20:39:15.729] * (254) via 10.210.129.1 on eth1 10.210.139.2/32 unicast [direct1 20:44:42.238] * (255) dev p1-1-5-1-6 unicast [ospfv2_1 20:44:57.045] E1 (145/10) [0.0.139.2] via 10.210.139.2 on p1-1-5-1-6 unicast [direct1 20:44:42.866] (255) dev p1-1-1-1-2 10.210.139.5/32 unicast [direct1 20:59:37.975] * (255) dev p1-1-3-1-4 unicast [ospfv2_1 20:59:54.045] E1 (145/10) [0.0.139.5] via 10.210.139.5 on p1-1-3-1-4 10.210.139.1/32 unicast [direct1 20:39:08.049] * (255) dev lo 10.210.139.4/32 unicast [ospfv2_1 21:00:23.044] * E1 (145/20) [0.0.139.4] via 10.210.139.5 on p1-1-3-1-4 10.210.129.0/24 unicast [direct1 20:39:15.728] * (255) dev eth1 10.210.139.3/32 unicast [ospfv2_1 20:50:04.045] * E1 (145/20) [0.0.139.3] via 10.210.139.2 on p1-1-5-1-6 10.0.0.0/20 unicast [direct1 20:38:51.728] * (255) dev eth0 bird> show ospf topology ospfv2_1 area 0.0.0.0 router 0.0.139.1 distance 0 router 0.0.139.2 metric 10 router 0.0.139.2 metric 10 router 0.0.139.5 metric 10 router 0.0.139.2 distance 10 router 0.0.139.1 metric 10 router 0.0.139.1 metric 10 router 0.0.139.3 metric 10 router 0.0.139.3 metric 10 router 0.0.139.3 distance 20 router 0.0.139.2 metric 10 router 0.0.139.2 metric 10 router 0.0.139.4 metric 10 router 0.0.139.4 distance 20 router 0.0.139.3 metric 10 router 0.0.139.5 metric 10 router 0.0.139.5 distance 10 router 0.0.139.1 metric 10 router 0.0.139.4 metric 10 bird>
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address. Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken. This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change). Hopefully Quagga/FRR already fixed that issue and perhaps we should add an option to revert back to the old behavior in case someone noticed a compatibility issue. It would be useful if anybody could try OSPFv2 unnumbered PtP links between BIRD with this patch and other implementations. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Fri, May 22, 2020 at 10:59:52PM +0200, Ondrej Zajicek wrote:
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Hopefully Quagga/FRR already fixed that issue and perhaps we should add an option to revert back to the old behavior in case someone noticed a compatibility issue.
Unfortunately, it seems that at least recent Mikrotik is broken w.r.t. unnumbered PtP links, so we cannot reasonably use this patch and we would need to find another approach. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Sat, 2020-05-23 at 01:54 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 10:59:52PM +0200, Ondrej Zajicek wrote:
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address. Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken. This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change). Hopefully Quagga/FRR already fixed that issue and perhaps we should add an option to revert back to the old behavior in case someone noticed a compatibility issue.
I fixed this particular problem in Quagga before suggesting the same change in Bird so Quagga should be fine, as should FFR as it is derived from Quagga. This is the commit: http://git.savannah.gnu.org/cgit/quagga.git/commit/?id=c81ee5c94f5b34375f3ef...
Unfortunately, it seems that at least recent Mikrotik is broken w.r.t.
unnumbered PtP links, so we cannot reasonably use this patch and we
would need to find another approach.
well, Microtik is broken to begin with and if you struggle with keeping a workaround in place I would just drop Microtic compatibility. Jocke
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Not sure I follow here, have you done away with rt_pos_to_ifa() and friends now and gone back to the old way? The old way had several drawbacks, one of them was this dependency on interface ID. Does current impl. depend on a well behaved neighbor too? Is it compatible with any other Bird release? Jocke
On Sat, May 23, 2020 at 10:43:52AM +0000, Joakim Tjernlund wrote:
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Not sure I follow here, have you done away with rt_pos_to_ifa() and friends now and gone back to the old way?
Yes, it does not use rt_pos_to_ifa(). The approach with rt_pos_to_ifa() does not work with graceful restart - after restart, router learns its own LSAs (generated by previous run) and needs to do routing table calculation without stored pos info. And it is probably bad idea to have different route calculation algorithms in these cases.
The old way had several drawbacks, one of them was this dependency on interface ID. Does current impl. depend on a well behaved neighbor too?
The current (2.0.7) is broken with regard to multiple unnumbered PtPs with the same local IP address (as it uses only IP address in the data field), but does not depend on well behaved neighbors. The offered patch uses interface IDs, like described in RFC. That patch is reliable (i do not see any issue with using interface IDs, what do you mean?), but depends on well behaved neighbors. And it seems that there are significat badly behaved ones.
Is it compatible with any other Bird release?
Yes, that is not an issue. BIRD (at least post-2012) does not use 'data' field of PtP links from neighbor LSAs. This field is only relevant for the router who originated that LSA. It would work (for BIRD peers) even if we put some random numbers here. My current idea how to make it work without interface-ids and without stored pos info: The problem is to match Router-LSA records with OSPF ifaces that generated them. Instead of using just 'data' field, we can use all fields ('data' to mach local IP address, router-id to see if there is established neighbor with that router-id, and matching configured cost). And for case with two parallel equal links that are described in Router-LSA by two equal records, we would have flag (in ospf_iface) that ensures one OSPF iface is matched with at most one record, so the first record is matched with the first matching iface and the second record is matched with the second matching iface. I would be glad to hear any comments to this idea or suggestions of other ideas how to solve it. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Sat, 2020-05-23 at 13:57 +0200, Ondrej Zajicek wrote:
On Sat, May 23, 2020 at 10:43:52AM +0000, Joakim Tjernlund wrote:
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Not sure I follow here, have you done away with rt_pos_to_ifa() and friends now and gone back to the old way?
Yes, it does not use rt_pos_to_ifa(). The approach with rt_pos_to_ifa() does not work with graceful restart - after restart, router learns its own LSAs (generated by previous run) and needs to do routing table calculation without stored pos info. And it is probably bad idea to have different route calculation algorithms in these cases.
hmm, I wonder how other impl. does this. I recall the some reference impl. also used something like the pos method?
The old way had several drawbacks, one of them was this dependency on interface ID. Does current impl. depend on a well behaved neighbor too?
The current (2.0.7) is broken with regard to multiple unnumbered PtPs with the same local IP address (as it uses only IP address in the data field), but does not depend on well behaved neighbors.
The offered patch uses interface IDs, like described in RFC. That patch is reliable (i do not see any issue with using interface IDs, what do you mean?), but depends on well behaved neighbors. And it seems that there are significat badly behaved ones.
When I developed this pos idea on Quagga I found that Q was dependant on the remote neighbour also sending interface ID in its LSA for unnumbered interfaces. Using the pos method, all this dependency went away. I suspect Bird does not have this dependency, it will work regardless of IP address or Interface ID in remote hosts?
Is it compatible with any other Bird release?
Yes, that is not an issue. BIRD (at least post-2012) does not use 'data' field of PtP links from neighbor LSAs. This field is only relevant for the router who originated that LSA. It would work (for BIRD peers) even if we put some random numbers here.
My current idea how to make it work without interface-ids and without stored pos info: The problem is to match Router-LSA records with OSPF ifaces that generated them. Instead of using just 'data' field, we can use all fields ('data' to mach local IP address, router-id to see if there is established neighbor with that router-id, and matching configured cost).
I think this will not work, all my efforts to match the correct interface just using the LSAs failed for multiple unnumbered I/Fs to the same remote host. That lead to the pos method in the end. Now, this was many years ago som maybe memory fails me.
And for case with two parallel equal links that are described in Router-LSA by two equal records, we would have flag (in ospf_iface) that ensures one OSPF iface is matched with at most one record, so the first record is matched with the first matching iface and the second record is matched with the second matching iface.
There can be more than 2 ptop links, we can have many more than 2.
I would be glad to hear any comments to this idea or suggestions of other ideas how to solve it.
On Sat, 2020-05-23 at 15:28 +0000, Joakim Tjernlund wrote:
On Sat, 2020-05-23 at 13:57 +0200, Ondrej Zajicek wrote:
On Sat, May 23, 2020 at 10:43:52AM +0000, Joakim Tjernlund wrote:
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it? Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing. We definitely need this fix in the pending 2.0.8 :-) This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address. Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken. This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change). Not sure I follow here, have you done away with rt_pos_to_ifa() and friends now and gone back to the old way? Yes, it does not use rt_pos_to_ifa(). The approach with rt_pos_to_ifa() does not work with graceful restart - after restart, router learns its own LSAs (generated by previous run) and needs to do routing table calculation without stored pos info. And it is probably bad idea to have different route calculation algorithms in these cases.
hmm, I wonder how other impl. does this. I recall the some reference impl. also
used something like the pos method?
The old way had several drawbacks, one of them was this dependency on interface ID. Does current impl. depend on a well behaved neighbor too? The current (2.0.7) is broken with regard to multiple unnumbered PtPs with the same local IP address (as it uses only IP address in the data field), but does not depend on well behaved neighbors. The offered patch uses interface IDs, like described in RFC. That patch is reliable (i do not see any issue with using interface IDs, what do you mean?), but depends on well behaved neighbors. And it seems that there are significat badly behaved ones.
When I developed this pos idea on Quagga I found that Q was dependant on
the remote neighbour also sending interface ID in its LSA for unnumbered interfaces.
Using the pos method, all this dependency went away. I suspect Bird does not have this
dependency, it will work regardless of IP address or Interface ID in remote hosts?
Is it compatible with any other Bird release? Yes, that is not an issue. BIRD (at least post-2012) does not use 'data' field of PtP links from neighbor LSAs. This field is only relevant for the router who originated that LSA. It would work (for BIRD peers) even if we put some random numbers here. My current idea how to make it work without interface-ids and without stored pos info: The problem is to match Router-LSA records with OSPF ifaces that generated them. Instead of using just 'data' field, we can use all fields ('data' to mach local IP address, router-id to see if there is established neighbor with that router-id, and matching configured cost).
I think this will not work, all my efforts to match the correct interface just using
the LSAs failed for multiple unnumbered I/Fs to the same remote host. That lead to the
pos method in the end. Now, this was many years ago som maybe memory fails me.
And for case with two parallel equal links that are described in Router-LSA by two equal records, we would have flag (in ospf_iface) that ensures one OSPF iface is matched with at most one record, so the first record is matched with the first matching iface and the second record is matched with the second matching iface.
There can be more than 2 ptop links, we can have many more than 2.
I would be glad to hear any comments to this idea or suggestions of other ideas how to solve it.
Thinking some more on this .. The pos method depends only on your own Router LSA. If I recall correctly(mind you, this was a long time ago), the R-LSA only depends on your own interfaces wanting to use OSPF, you could always calculate your own R-LSA before starting any graceful restart(or so I hope) instead of relying on someone else's idea of you own R-LSA(it would be safer to only trust your self). Jocke
On Sat, May 23, 2020 at 03:55:16PM +0000, Joakim Tjernlund wrote:
I would be glad to hear any comments to this idea or suggestions of other ideas how to solve it.
Thinking some more on this ..
The pos method depends only on your own Router LSA. If I recall correctly(mind you, this was a long time ago), the R-LSA only depends on your own interfaces wanting to use OSPF, you could always calculate your own R-LSA before starting any graceful restart(or so I hope) instead of relying on someone else's idea of you own R-LSA(it would be safer to only trust your self).
Well, the graceful restart is based on idea that your FIB (or Kernel) still has previous routing table, so you should not use your current adjacency state (as some adjacencies may not be yet established). You need to parse your old R-LSA, wait until all adjacencies there described are newly established, and then resume normal operation. Now i am not sure if route recalculation is done during GR, or if the R-LSA record matching is necessary just for this parsing, will check that. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Sat, 2020-05-23 at 18:19 +0200, Ondrej Zajicek wrote:
On Sat, May 23, 2020 at 03:55:16PM +0000, Joakim Tjernlund wrote:
I would be glad to hear any comments to this idea or suggestions of other ideas how to solve it.
Thinking some more on this ..
The pos method depends only on your own Router LSA. If I recall correctly(mind you, this was a long time ago), the R-LSA only depends on your own interfaces wanting to use OSPF, you could always calculate your own R-LSA before starting any graceful restart(or so I hope) instead of relying on someone else's idea of you own R-LSA(it would be safer to only trust your self).
Well, the graceful restart is based on idea that your FIB (or Kernel) still has previous routing table, so you should not use your current adjacency state (as some adjacencies may not be yet established).
You need to parse your old R-LSA, wait until all adjacencies there described are newly established, and then resume normal operation.
To be sure all adjacencies are there you would need to compare it against your own newly calculated R-LSA? Once your are "happy", you can continue with you own R-LSA and still use the pos method.
Now i am not sure if route recalculation is done during GR, or if the R-LSA record matching is necessary just for this parsing, will check that.
On Sat, May 23, 2020 at 05:49:35PM +0000, Joakim Tjernlund wrote:
On Sat, 2020-05-23 at 18:19 +0200, Ondrej Zajicek wrote:
On Sat, May 23, 2020 at 03:55:16PM +0000, Joakim Tjernlund wrote:
I would be glad to hear any comments to this idea or suggestions of other ideas how to solve it.
Thinking some more on this ..
The pos method depends only on your own Router LSA. If I recall correctly(mind you, this was a long time ago), the R-LSA only depends on your own interfaces wanting to use OSPF, you could always calculate your own R-LSA before starting any graceful restart(or so I hope) instead of relying on someone else's idea of you own R-LSA(it would be safer to only trust your self).
Well, the graceful restart is based on idea that your FIB (or Kernel) still has previous routing table, so you should not use your current adjacency state (as some adjacencies may not be yet established).
You need to parse your old R-LSA, wait until all adjacencies there described are newly established, and then resume normal operation.
To be sure all adjacencies are there you would need to compare it against your own newly calculated R-LSA? Once your are "happy", you can continue with you own R-LSA and still use the pos method.
Just checked that and router is supposed to do routing table calculations even during graceful restart (but not use the routes for forwarding). AFAIK, it is necessary just for vlinks - they are established only if the other side is accessible in routing table computation. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Sat, May 23, 2020 at 03:28:00PM +0000, Joakim Tjernlund wrote:
On Sat, 2020-05-23 at 13:57 +0200, Ondrej Zajicek wrote:
On Sat, May 23, 2020 at 10:43:52AM +0000, Joakim Tjernlund wrote:
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Not sure I follow here, have you done away with rt_pos_to_ifa() and friends now and gone back to the old way?
Yes, it does not use rt_pos_to_ifa(). The approach with rt_pos_to_ifa() does not work with graceful restart - after restart, router learns its own LSAs (generated by previous run) and needs to do routing table calculation without stored pos info. And it is probably bad idea to have different route calculation algorithms in these cases.
hmm, I wonder how other impl. does this. I recall the some reference impl. also used something like the pos method?
The old way had several drawbacks, one of them was this dependency on interface ID. Does current impl. depend on a well behaved neighbor too?
The current (2.0.7) is broken with regard to multiple unnumbered PtPs with the same local IP address (as it uses only IP address in the data field), but does not depend on well behaved neighbors.
The offered patch uses interface IDs, like described in RFC. That patch is reliable (i do not see any issue with using interface IDs, what do you mean?), but depends on well behaved neighbors. And it seems that there are significat badly behaved ones.
When I developed this pos idea on Quagga I found that Q was dependant on the remote neighbour also sending interface ID in its LSA for unnumbered interfaces. Using the pos method, all this dependency went away. I suspect Bird does not have this dependency, it will work regardless of IP address or Interface ID in remote hosts?
Perhaps it was other way (Q dependent on remote neighbor sending IP)? See http://trubka.network.cz/pipermail/bird-users/2012-August/007880.html
Is it compatible with any other Bird release?
Yes, that is not an issue. BIRD (at least post-2012) does not use 'data' field of PtP links from neighbor LSAs. This field is only relevant for the router who originated that LSA. It would work (for BIRD peers) even if we put some random numbers here.
My current idea how to make it work without interface-ids and without stored pos info: The problem is to match Router-LSA records with OSPF ifaces that generated them. Instead of using just 'data' field, we can use all fields ('data' to mach local IP address, router-id to see if there is established neighbor with that router-id, and matching configured cost).
I think this will not work, all my efforts to match the correct interface just using the LSAs failed for multiple unnumbered I/Fs to the same remote host. That lead to the pos method in the end. Now, this was many years ago som maybe memory fails me.
And for case with two parallel equal links that are described in Router-LSA by two equal records, we would have flag (in ospf_iface) that ensures one OSPF iface is matched with at most one record, so the first record is matched with the first matching iface and the second record is matched with the second matching iface.
There can be more than 2 ptop links, we can have many more than 2.
It would work with any number of links - the flag would mean 'already consumed for PtP link', so next same PtP description would not match it and use next available matching interface. It probably would not work for unnumbered PtMP interfaces, but these are not supposed to exist by RFC 2328 anyways. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Sat, 2020-05-23 at 18:04 +0200, Ondrej Zajicek wrote:
On Sat, May 23, 2020 at 03:28:00PM +0000, Joakim Tjernlund wrote:
On Sat, 2020-05-23 at 13:57 +0200, Ondrej Zajicek wrote:
On Sat, May 23, 2020 at 10:43:52AM +0000, Joakim Tjernlund wrote:
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote: > This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Not sure I follow here, have you done away with rt_pos_to_ifa() and friends now and gone back to the old way?
Yes, it does not use rt_pos_to_ifa(). The approach with rt_pos_to_ifa() does not work with graceful restart - after restart, router learns its own LSAs (generated by previous run) and needs to do routing table calculation without stored pos info. And it is probably bad idea to have different route calculation algorithms in these cases.
hmm, I wonder how other impl. does this. I recall the some reference impl. also used something like the pos method?
The old way had several drawbacks, one of them was this dependency on interface ID. Does current impl. depend on a well behaved neighbor too?
The current (2.0.7) is broken with regard to multiple unnumbered PtPs with the same local IP address (as it uses only IP address in the data field), but does not depend on well behaved neighbors.
The offered patch uses interface IDs, like described in RFC. That patch is reliable (i do not see any issue with using interface IDs, what do you mean?), but depends on well behaved neighbors. And it seems that there are significat badly behaved ones.
When I developed this pos idea on Quagga I found that Q was dependant on the remote neighbour also sending interface ID in its LSA for unnumbered interfaces. Using the pos method, all this dependency went away. I suspect Bird does not have this dependency, it will work regardless of IP address or Interface ID in remote hosts?
Perhaps it was other way (Q dependent on remote neighbor sending IP)?
See http://trubka.network.cz/pipermail/bird-users/2012-August/007880.html
Not sure, I do recall Q had problem using unnumbered in one end of the link and numbered in the other, a hint of this can been seen from the old commit msg: This has the following advantages: - Multiple PtP interfaces with the same IP address between two routers. - Use Unnumbered PtP on just one end of the link. <<<<----- - Faster OI lookup for the OSPF interface and only done once for PtoP links. This should all be history by now as Q no longer uses the old way.
Is it compatible with any other Bird release?
Yes, that is not an issue. BIRD (at least post-2012) does not use 'data' field of PtP links from neighbor LSAs. This field is only relevant for the router who originated that LSA. It would work (for BIRD peers) even if we put some random numbers here.
My current idea how to make it work without interface-ids and without stored pos info: The problem is to match Router-LSA records with OSPF ifaces that generated them. Instead of using just 'data' field, we can use all fields ('data' to mach local IP address, router-id to see if there is established neighbor with that router-id, and matching configured cost).
I think this will not work, all my efforts to match the correct interface just using the LSAs failed for multiple unnumbered I/Fs to the same remote host. That lead to the pos method in the end. Now, this was many years ago som maybe memory fails me.
And for case with two parallel equal links that are described in Router-LSA by two equal records, we would have flag (in ospf_iface) that ensures one OSPF iface is matched with at most one record, so the first record is matched with the first matching iface and the second record is matched with the second matching iface.
There can be more than 2 ptop links, we can have many more than 2.
It would work with any number of links - the flag would mean 'already consumed for PtP link', so next same PtP description would not match it and use next available matching interface.
It probably would not work for unnumbered PtMP interfaces, but these are not supposed to exist by RFC 2328 anyways.
I see, I cannot say if this has any drawbacks though. It feels like some middle ground between the old method and the pos method. Jocke
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Hopefully Quagga/FRR already fixed that issue and perhaps we should add an option to revert back to the old behavior in case someone noticed a compatibility issue.
It would be useful if anybody could try OSPFv2 unnumbered PtP links between BIRD with this patch and other implementations.
Tried a small interop by loading bird 2.0.4 on node 2 and Quagga 0.99.11 on node 3 (the Quagga build has the unnumbered patch mentioned by Jocke). Results looks correct as far as I can tell. Seems like that patch is also part of FRR; https://github.com/FRRouting/frr/commit/c81ee5c94f And it still looks like at least the functions are there in latest FRR 7.3.1. https://github.com/FRRouting/frr/blob/frr-7.3.1/ospfd/ospf_interface.c#L393 Haven't actually tested if this actually interops with bird. The RFC states that unnumbered ptp links shall use ifIndex, whereas as numbered ptp links shall use IP interface address. Any reason to not follow the RFC? https://tools.ietf.org/html/rfc2328#page-130 Ondrej, what are you plans for the patch provided? Good to go for master?
On Mon, May 25, 2020 at 06:04:34PM +0000, Kenth Eriksson wrote:
On Fri, 2020-05-22 at 22:59 +0200, Ondrej Zajicek wrote:
On Fri, May 22, 2020 at 07:14:44PM +0000, Kenth Eriksson wrote:
On Thu, 2020-05-21 at 12:43 +0200, Ondrej Zajicek wrote:
This patch should fix the issue, could you try it?
Looks promising, applied on top of 2.0.7, and a quick test on the 5 node setup looks correct. Will do some more testing.
We definitely need this fix in the pending 2.0.8 :-)
This issue has a long history. In 2012, we changed data field for unnumbered PtP links from iface id (specified by RFC) to IP address based on reports of bugs in Quagga that required it, and we used out-of-band information to distinquish unnumberred PtPs with the same local IP address.
Then with OSPF graceful restart implementation, we found that we can no longer use out-of-band information, and we need to use only LSAdb info for routing table calculation, but i forgot to finish handling of this case, so multiple unnumbered PtPs with the same local IP addresses were broken.
This patch returned back iface id to data field for unnumbered PtP links (i.e. reverted back the change from 2012), while doing computation just from LSAdb info. It fixed your case (multiple unnumbered PtPs with the same local IP address) and is correct per RFC, but it may trigger bugs with other implementations (like the one that led to the 2012 change).
Hopefully Quagga/FRR already fixed that issue and perhaps we should add an option to revert back to the old behavior in case someone noticed a compatibility issue.
It would be useful if anybody could try OSPFv2 unnumbered PtP links between BIRD with this patch and other implementations.
Tried a small interop by loading bird 2.0.4 on node 2 and Quagga 0.99.11 on node 3 (the Quagga build has the unnumbered patch mentioned by Jocke). Results looks correct as far as I can tell.
Seems like that patch is also part of FRR; https://github.com/FRRouting/frr/commit/c81ee5c94f
And it still looks like at least the functions are there in latest FRR 7.3.1. https://github.com/FRRouting/frr/blob/frr-7.3.1/ospfd/ospf_interface.c#L393
Hi Checked that. Seems like FRR uses the similar approach like BIRD 2.0.4, so that is OK. It also seems that FRR does not implement OSPF graceful restart, so they did not (yet) hit the same issue with Jocke's patch like we in 2.0.5.
Haven't actually tested if this actually interops with bird.
The RFC states that unnumbered ptp links shall use ifIndex, whereas as numbered ptp links shall use IP interface address. Any reason to not follow the RFC? https://tools.ietf.org/html/rfc2328#page-130
Well, i generally prefer not to make intentional changes that break existing setups, and switching to this (as done by the patch i sent) would break Mikrotik compatibility for unnumbered PtP links (due to Mikrotik broken SPF calculation).
Ondrej, what are you plans for the patch provided? Good to go for master?
Seems to me that perhaps the least painful solution is to use 2.0.4 approach (position based) for regular OSPF, and switch to ifIndex/data based approach (like the patch) when OSPF graceful restart is enabled. So plan is to make a new/different patch for master. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Mon, 2020-05-25 at 23:04 +0200, Ondrej Zajicek wrote:
Tried a small interop by loading bird 2.0.4 on node 2 and Quagga 0.99.11 on node 3 (the Quagga build has the unnumbered patch mentioned by Jocke). Results looks correct as far as I can tell.
Seems like that patch is also part of FRR; https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFRRouting%2Ffrr%2Fcommit%2Fc81ee5c94f&data=02%7C01%7CKenth.Eriksson%40infinera.com%7C630dfc1f3cd542e6263408d800ef4369%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637260374926746579&sdata=B%2FjY%2Fi06o2Lzg0zAFE3WgBu5kBWApIGRnrzkPpUys2E%3D&reserved=0
And it still looks like at least the functions are there in latest FRR 7.3.1. https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFRRouting%2Ffrr%2Fblob%2Ffrr-7.3.1%2Fospfd%2Fospf_interface.c%23L393&data=02%7C01%7CKenth.Eriksson%40infinera.com%7C630dfc1f3cd542e6263408d800ef4369%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637260374926746579&sdata=q3B2op7fe9f8e%2FL4WHw6TqroFtCE5gXi71V6QdtY88k%3D&reserved=0
Hi
Checked that. Seems like FRR uses the similar approach like BIRD 2.0.4, so that is OK. It also seems that FRR does not implement OSPF graceful restart, so they did not (yet) hit the same issue with Jocke's patch like we in 2.0.5.
The development of OSPF graceful restart in bird is a good first step. But I believe most use cases would need unplanned GR. If you are limited to a planned GR, you probably have a service window anyway.
Haven't actually tested if this actually interops with bird.
The RFC states that unnumbered ptp links shall use ifIndex, whereas as numbered ptp links shall use IP interface address. Any reason to not follow the RFC? https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc2328%23page-130&data=02%7C01%7CKenth.Eriksson%40infinera.com%7C630dfc1f3cd542e6263408d800ef4369%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637260374926746579&sdata=ELhNbVgkTrU4EOVzBsjESQSoPyiuWNs6QKPStsENC5M%3D&reserved=0
Well, i generally prefer not to make intentional changes that break existing setups, and switching to this (as done by the patch i sent) would break Mikrotik compatibility for unnumbered PtP links (due to Mikrotik broken SPF calculation).
Not sure I agree the alternative is better. Violating standard to maintain interoperability with a broken Mikrotik implementation. That only makes sense if the Mikrotik way of doing it was de facto standard. If not, drop compatibility.
Ondrej, what are you plans for the patch provided? Good to go for master?
Seems to me that perhaps the least painful solution is to use 2.0.4 approach (position based) for regular OSPF, and switch to ifIndex/data based approach (like the patch) when OSPF graceful restart is enabled.
So does that mean that there is a bird interop issue for nodes running with and without GR activated?
So plan is to make a new/different patch for master.
On Tue, May 26, 2020 at 07:58:57AM +0000, Kenth Eriksson wrote:
On Mon, 2020-05-25 at 23:04 +0200, Ondrej Zajicek wrote:
Haven't actually tested if this actually interops with bird.
The RFC states that unnumbered ptp links shall use ifIndex, whereas as numbered ptp links shall use IP interface address. Any reason to not follow the RFC?
Well, i generally prefer not to make intentional changes that break existing setups, and switching to this (as done by the patch i sent) would break Mikrotik compatibility for unnumbered PtP links (due to Mikrotik broken SPF calculation).
Not sure I agree the alternative is better. Violating standard to maintain interoperability with a broken Mikrotik implementation. That only makes sense if the Mikrotik way of doing it was de facto standard. If not, drop compatibility.
Well, we will change it during next major release, while keeping compatibility during minor releases. The latest patch also allow to control it as a per-interface setting.
Ondrej, what are you plans for the patch provided? Good to go for master?
Seems to me that perhaps the least painful solution is to use 2.0.4 approach (position based) for regular OSPF, and switch to ifIndex/data based approach (like the patch) when OSPF graceful restart is enabled.
So does that mean that there is a bird interop issue for nodes running with and without GR activated?
No, this is a local issue for each node, it should not lead to interoperability issues with BIRD on different setting or with different implementations.
So plan is to make a new/different patch for master.
Made a new patch that went to master, could you try it? https://gitlab.labs.nic.cz/labs/bird/-/commit/c1632ad0f39f7221d649a9e469cacc... -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Tue, 2020-05-26 at 18:47 +0200, Ondrej Zajicek wrote:
So plan is to make a new/different patch for master.
Made a new patch that went to master, could you try it?
https://gitlab.labs.nic.cz/labs/bird/-/commit/c1632ad0f39f7221d649a9e469cacc...
Tested quickly, didn't see any issues.
participants (3)
-
Joakim Tjernlund -
Kenth Eriksson -
Ondrej Zajicek