OSPF performance/SPF calculations

Joakim Tjernlund

21 Apr 2010 21 Apr '10

7:41 a.m.

I am using Quagga ATM but I had a quick look at BIRD and I got a few observations. The LSA/checksum code seem very inefficient. LSAs are built allocating/reallocing bits of memory. This is slow and will case memory fragmentation. The fletcher checksum is very slow as it has extra tests in the hot path and also flips endian in the LSA back and forth. I can't work out how the SPF next hop works(calc_next_hop). I tried to compare it with Quagga's ospf_nexthop_calculation() but the structure is too different. The reason for me looking into this was to see how much work it would be to add unnumbered ppp links but as I can't work out how nexthop is working I didn't get very far. I have impl. unnumbered ppp link for Quagga that works really well but this work hasn't been accepted into Quagga yet since Quagga development has stalled. I could show you how I did it Quagga though. Jocke

Show replies by date

Ondrej Zajicek

21 Apr 21 Apr

6:15 p.m.

On Wed, Apr 21, 2010 at 09:41:47AM +0200, Joakim Tjernlund wrote:

...

I am using Quagga ATM but I had a quick look at BIRD and I got a few observations.

Hello. Thank you for your tips and notes.

...

The LSA/checksum code seem very inefficient. LSAs are built allocating/reallocing bits of memory. This is slow and will case memory fragmentation.

You mean lsab_alloc() in originate_rt_lsa_body()? This allocation is just a sequential allocation in a persistent memory buffer, therefore it is very efficient (in most cases just increase of lsab_used counter) and there si no memory fragmentation (all is done inside a persistent memory buffer).

...

The fletcher checksum is very slow as it has extra tests in the hot path and also flips endian in the LSA back and forth.

Do you have some suggestions for a better algorithm, or just a better implementation? Yes, the problem with flipping endianity is known. I do not studied this part of the code yet (the checksum algorithm).

...

I can't work out how the SPF next hop works(calc_next_hop). I tried to compare it with Quagga's ospf_nexthop_calculation() but the structure is too different. The reason for me looking into this was to see how much work it would be to add unnumbered ppp links but as I can't work out how nexthop is working I didn't get very far.

The current code is different from what RFC says. For direct neighbors (on both ptp and non-ptp networks), we just use IP address taken from the source address of the HELLO packet of the neighbor (stored in neighbor structure returned by find_neigh_noifa()). This works well [*] for both broadcast and ptp links (numbered or unnumbered), but is broken on NBMA networks. I have some not-yet-commited changes to calc_next_hop() that on broadcast and NBMA networks uses the standard way (taking IP address from the router LSA) and the source address from HELLO packet as a nexthop is used just for PTP links. Such behavior is also suggested by RFC 5340 (for OSFPv3). [*] Regardless of how the other side is implemented.

...

I have impl. unnumbered ppp link for Quagga that works really well but this work hasn't been accepted into Quagga yet since Quagga development has stalled. I could show you how I did it Quagga though.

The development state of Quagga is sad. Do you implement it in a different way than in BIRD? I wonder whether there is any other possible way to get next hop address for unnumbered ptp links than from source address of HELLO packet. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

8:04 p.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/21 20:15:07:

...

On Wed, Apr 21, 2010 at 09:41:47AM +0200, Joakim Tjernlund wrote:

...
I am using Quagga ATM but I had a quick look at BIRD and I got a few observations.

Hello. Thank you for your tips and notes.

...
The LSA/checksum code seem very inefficient. LSAs are built allocating/reallocing bits of memory. This is slow and will case memory fragmentation.

You mean lsab_alloc() in originate_rt_lsa_body()? This allocation is just a sequential allocation in a persistent memory buffer, therefore it is very efficient (in most cases just increase of lsab_used counter) and there si no memory fragmentation (all is done inside a persistent memory buffer).

Yes, you do realloc on small amounts of memory. Also, receives LSAs seems to be impl. differently so you need handle these somewhat differently.

...

...
The fletcher checksum is very slow as it has extra tests in the hot path and also flips endian in the LSA back and forth.

Do you have some suggestions for a better algorithm, or just a better implementation? Yes, the problem with flipping endianity is known. I do not studied this part of the code yet (the checksum algorithm).

Yes, but not at the moment. The endian problem should be addressed when you build the lsa.

...

...
I can't work out how the SPF next hop works(calc_next_hop). I tried to compare it with Quagga's ospf_nexthop_calculation() but the structure is too different. The reason for me looking into this was to see how much work it would be to add unnumbered ppp links but as I can't work out how nexthop is working I didn't get very far.

The current code is different from what RFC says. For direct neighbors (on both ptp and non-ptp networks), we just use IP address taken from the source address of the HELLO packet of the neighbor (stored in neighbor structure returned by find_neigh_noifa()).

This works well [*] for both broadcast and ptp links (numbered or unnumbered), but is broken on NBMA networks. I have some not-yet-commited changes to calc_next_hop() that on broadcast and NBMA networks uses the standard way (taking IP address from the router LSA) and the source address from HELLO packet as a nexthop is used just for PTP links. Such behavior is also suggested by RFC 5340 (for OSFPv3).

[*] Regardless of how the other side is implemented.

...
I have impl. unnumbered ppp link for Quagga that works really well but this work hasn't been accepted into Quagga yet since Quagga development has stalled. I could show you how I did it Quagga though.

The development state of Quagga is sad. Do you implement it in a different way than in BIRD? I wonder whether there is any other possible way to get next hop address for unnumbered ptp links than from source address of HELLO packet.

Yep, now it gets tricky. It took me quite some time to figure out what to do. The secret is that you never use search for the interface using IP addresses in the LSA's. Instead you record what interface created what entry in in your own Router LSA. Based on the position of on entry in your own router LSA you can lookup the interface that created that entry. Once you know the interface, the reset is easy. One way to impl. this is to build an array of interface pointers when you construct the Router LSA: RLSA Array Item 0 ptr to ppp0 Item 1 ptr to ppp0 Item 2 ptr to eth0 ... Item n ptr to if n Now you can easily find the interface if you pass on the position of the the entry in the RLSA you are processing to the nexthop calculation. I have a few patches to Quagga that does this somewhere, they are all on the Quagga mailing list and a few of them are also in my personal git repo at the Quagga site. Jocke

Ondrej Zajicek

23 Apr 23 Apr

8:32 a.m.

On Wed, Apr 21, 2010 at 10:04:06PM +0200, Joakim Tjernlund wrote:

...

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/21 20:15:07:

...
On Wed, Apr 21, 2010 at 09:41:47AM +0200, Joakim Tjernlund wrote:

...
I am using Quagga ATM but I had a quick look at BIRD and I got a few observations.

Hello. Thank you for your tips and notes.

...
The LSA/checksum code seem very inefficient. LSAs are built allocating/reallocing bits of memory. This is slow and will case memory fragmentation.

You mean lsab_alloc() in originate_rt_lsa_body()? This allocation is just a sequential allocation in a persistent memory buffer, therefore it is very efficient (in most cases just increase of lsab_used counter) and there si no memory fragmentation (all is done inside a persistent memory buffer).

Yes, you do realloc on small amounts of memory. Also, receives LSAs seems to be impl. differently so you need handle these somewhat differently.

realloc() is used only for a persistent buffer to grow it sufficiently large. So it is called only small number of times during the run of the program and not in subsequent LSA originations. Therefore it is not an issue.

...

...
The development state of Quagga is sad. Do you implement it in a different way than in BIRD? I wonder whether there is any other possible way to get next hop address for unnumbered ptp links than from source address of HELLO packet.

Yep, now it gets tricky. It took me quite some time to figure out what to do. The secret is that you never use search for the interface using IP addresses in the LSA's.

That we never done for PTP ifaces.

...

Instead you record what interface created what entry in in your own Router LSA. Based on the position of on entry in your own router LSA you can lookup the interface that created that entry. Once you know the interface, the reset is easy.

Yes, i got the idea. Our algorithm (for PTP) is to search for a ptp iface with a full neighbor with given Router ID and choose the cheapest one. This would lead to the same results as your idea, but a slightly less efficient, but probably not important unless you have a hundreds of PTP ifaces on a router. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

8:50 a.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 10:32:06:

...

...
...
The development state of Quagga is sad. Do you implement it in a different way than in BIRD? I wonder whether there is any other possible way to get next hop address for unnumbered ptp links than from source address of HELLO packet.

Yep, now it gets tricky. It took me quite some time to figure out what to do. The secret is that you never use search for the interface using IP addresses in the LSA's.

That we never done for PTP ifaces.

...
Instead you record what interface created what entry in in your own Router LSA. Based on the position of on entry in your own router LSA you can lookup the interface that created that entry. Once you know the interface, the reset is easy.

Yes, i got the idea. Our algorithm (for PTP) is to search for a ptp iface with a full neighbor with given Router ID and choose the cheapest one. This would lead to the same results as your idea, but a slightly less efficient, but probably not important unless you have a hundreds of PTP ifaces on a router.

I got a lot of PtP I/Fs(some 20-30) :) But this won't fix multiple ptp I/Fs between the same two routes and I don't think it will work if one end is unnumbered and the other one is not. As you can see from the patches I sent earlier there is a simple impl. and a more complicated impl. Jocke

Joakim Tjernlund

9:22 a.m.

...

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 10:32:06:

...
...
...
The development state of Quagga is sad. Do you implement it in a different way than in BIRD? I wonder whether there is any other possible way to get next hop address for unnumbered ptp links than from source address of HELLO packet.

Yep, now it gets tricky. It took me quite some time to figure out what to do. The secret is that you never use search for the interface using IP addresses in the LSA's.

That we never done for PTP ifaces.

...
Instead you record what interface created what entry in in your own Router LSA. Based on the position of on entry in your own router LSA you can lookup the interface that created that entry. Once you know the interface, the reset is easy.

Yes, i got the idea. Our algorithm (for PTP) is to search for a ptp iface with a full neighbor with given Router ID and choose the cheapest one. This would lead to the same results as your idea, but a slightly less efficient, but probably not important unless you have a hundreds of PTP ifaces on a router.

Forgot to mention, you use my proposal for ALL I/Fs, not only for PtoP I/Fs Jocke

Ondrej Zajicek

9:52 a.m.

On Fri, Apr 23, 2010 at 10:50:38AM +0200, Joakim Tjernlund wrote:

...

...
Yes, i got the idea. Our algorithm (for PTP) is to search for a ptp iface with a full neighbor with given Router ID and choose the cheapest one. This would lead to the same results as your idea, but a slightly less efficient, but probably not important unless you have a hundreds of PTP ifaces on a router.

I got a lot of PtP I/Fs(some 20-30) :)

But this won't fix multiple ptp I/Fs between the same two routes and I don't think it will work if one end is unnumbered and the other one is not.

Our algorithm (in the git tree) works well for multiple ptp ifaces between the same two routers (because both the SPF and the calc_next_hop() chooses the same (cheapest) ptp link. Our algorithm works regardless of whether it is unnumbered or numbered, as we don't use the information from the 'Link Data' field of router-LSA for ptp networks. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

10:38 a.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 11:52:26:

...

From: Ondrej Zajicek <santiago@crfreenet.org> To: Joakim Tjernlund <joakim.tjernlund@transmode.se> Cc: bird-users@trubka.network.cz Date: 2010/04/23 11:46 Subject: Re: OSPF performance/SPF calculations

On Fri, Apr 23, 2010 at 10:50:38AM +0200, Joakim Tjernlund wrote:

...
...
Yes, i got the idea. Our algorithm (for PTP) is to search for a ptp iface with a full neighbor with given Router ID and choose the cheapest one. This would lead to the same results as your idea, but a slightly less efficient, but probably not important unless you have a hundreds of PTP ifaces on a router.

I got a lot of PtP I/Fs(some 20-30) :)

But this won't fix multiple ptp I/Fs between the same two routes and I don't think it will work if one end is unnumbered and the other one is not.

Our algorithm (in the git tree) works well for multiple ptp ifaces between the same two routers (because both the SPF and the calc_next_hop() chooses the same (cheapest) ptp link.

Even if all local PtoP I/Fs have the same IP address or no IP address? What does this comment in calc_next_hop mean? /* * Remaining cases - local neighbours. * There are two problems with this code: * 1) we use IP address from HELLO packet * and not the one from LSA (router or link). * This may break NBMA networks * 2) we use find_neigh_noifa() and does not * take into account associated iface. * This breaks neighbors connected by more links. */

...

Our algorithm works regardless of whether it is unnumbered or numbered, as we don't use the information from the 'Link Data' field of router-LSA for ptp networks.

Yes, the Link Data field is useless for unnumbered I/F as it contains the local interface ifIndex and neither of Option 1 and Option 2 in section 12.4.1.1. Describing point-to-point interfaces

Ondrej Zajicek

12:09 p.m.

On Fri, Apr 23, 2010 at 12:38:59PM +0200, Joakim Tjernlund wrote:

...

...
...
But this won't fix multiple ptp I/Fs between the same two routes and I don't think it will work if one end is unnumbered and the other one is not.

Our algorithm (in the git tree) works well for multiple ptp ifaces between the same two routers (because both the SPF and the calc_next_hop() chooses the same (cheapest) ptp link.

...

Even if all local PtoP I/Fs have the same IP address or no IP address?

Yes, for the version in the git tree. Or more precisely, it works for links having the same ptp address. (a link with ptp/peer addresses is considered unnumbered). If there is really no IP address on the link, it will not work, but for several completely different reasons - BIRD core would not accept such route.

...

What does this comment in calc_next_hop mean? /* * Remaining cases - local neighbours. * There are two problems with this code: * 1) we use IP address from HELLO packet * and not the one from LSA (router or link). * This may break NBMA networks * 2) we use find_neigh_noifa() and does not * take into account associated iface. * This breaks neighbors connected by more links. */

This is an old code, which has some bugs i fixed a two weeks ago: The current version is here: http://git.nic.cz/bird/browser/proto/ospf/rt.c -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

1:27 p.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 14:09:50:

...

On Fri, Apr 23, 2010 at 12:38:59PM +0200, Joakim Tjernlund wrote:

...
...
...
But this won't fix multiple ptp I/Fs between the same two routes and I don't think it will work if one end is unnumbered and the other one is not.

Our algorithm (in the git tree) works well for multiple ptp ifaces between the same two routers (because both the SPF and the calc_next_hop() chooses the same (cheapest) ptp link.

...
Even if all local PtoP I/Fs have the same IP address or no IP address?

Yes, for the version in the git tree.

Or more precisely, it works for links having the same ptp address. (a link with ptp/peer addresses is considered unnumbered).

If there is really no IP address on the link, it will not work, but for several completely different reasons - BIRD core would not accept such route.

...
What does this comment in calc_next_hop mean? /* * Remaining cases - local neighbours. * There are two problems with this code: * 1) we use IP address from HELLO packet * and not the one from LSA (router or link). * This may break NBMA networks * 2) we use find_neigh_noifa() and does not * take into account associated iface. * This breaks neighbors connected by more links. */

This is an old code, which has some bugs i fixed a two weeks ago:

Oh, I figured I cloned fairly recently. Must have jut missed you change. Anyhow I looked at the new code and it is an improvement but I think there is a flaw: It looks like the ptp code just find ANY interface that has a connection to the remote router, not the I/F that has created that particular RLSA entry. That may lead to asymmetric links, TX one link and RX on another. Not sure what other implications there might be but I believe OSPF was not designed to deal with such scenarios. Jocke

Ondrej Zajicek

2:01 p.m.

On Fri, Apr 23, 2010 at 03:27:10PM +0200, Joakim Tjernlund wrote:

...

...
This is an old code, which has some bugs i fixed a two weeks ago:

Oh, I figured I cloned fairly recently. Must have jut missed you change.

I posted the change to the repository just two days ago.

...

Anyhow I looked at the new code and it is an improvement but I think there is a flaw: It looks like the ptp code just find ANY interface that has a connection to the remote router, not the I/F that has created that particular RLSA entry. That may lead to asymmetric links, TX one link and RX on another.

This is not a problem because both SPF and calc_next_hop() chooses the cheapest (full) ptp link. They both uses the same (local) metrics. Even if you have asymetric metrics to have one link for one direction and the other link for the other direction, the algorithm works, because each side uses its metrics for both calculations. There is one thing when the algorithm might break - if the cheapest full ptp link is rejected by link_back() check. But such situation is just transitional and would resolve immediately by either breaking the link or receiving the new router-LSA from the neighbor. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

25 Apr 25 Apr

4:08 p.m.

Slowly getting back to SPF again .. Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 16:01:16:

...

On Fri, Apr 23, 2010 at 03:27:10PM +0200, Joakim Tjernlund wrote:

...
Anyhow I looked at the new code and it is an improvement but I think there is a flaw: It looks like the ptp code just find ANY interface that has a connection to the remote router, not the I/F that has created that particular RLSA entry. That may lead to asymmetric links, TX one link and RX on another.

This is not a problem because both SPF and calc_next_hop() chooses the cheapest (full) ptp link. They both uses the same (local) metrics.

Our ptp links typically have the same cost between the same two routers so it is not unlikely that the link will be asymentric I think. However I don't think that will be a problem anymore as BIRD don't use/support equal cost multipath, right?

...

Even if you have asymetric metrics to have one link for one direction and the other link for the other direction, the algorithm works, because each side uses its metrics for both calculations.

There is one thing when the algorithm might break - if the cheapest full ptp link is rejected by link_back() check. But such situation is just transitional and would resolve immediately by either breaking the link or receiving the new router-LSA from the neighbor.

Right, how about these statements from RFC 2328: 16.1.1 para 4. If there is at least one intervening router in the current shortest path between the destination and the root, the destination simply inherits the set of next hops from the parent. 16.1.1 para 5. ...the parent vertex is a network that directly connects the calculating router to the destination router. The list of next hops is then determined by examining the destination's router-LSA For each link in the router-LSA that points back to the parent network, the link's Link Data field provides the IP address of a next hop router. The outgoing interface to use can then be derived from the next hop IP address (or it can be inherited from the parent network). These talk about inheriting ALL next hops from the parent and I don't see the BIRD does that. Looks like BIRD just selects one next hop from its parent? Assuming it is, does that impose any restricts? Some other random questions: One think I always wanted is the possibility to add an host IP address to all areas, in my case I like to export the routers primary IP address to all areas so the router can always be reached. Is that possible with BIRD? If you pull the cable to an ethernet I/F that is currently in a OSPF domain, do BIRD delete the whole subnet from the R-LSA or do you leave a host route with the I/F's IP address in the R-LSA? Jocke

Ondrej Zajicek

9:20 p.m.

On Sun, Apr 25, 2010 at 06:08:04PM +0200, Joakim Tjernlund wrote:

...

...
This is not a problem because both SPF and calc_next_hop() chooses the cheapest (full) ptp link. They both uses the same (local) metrics.

Our ptp links typically have the same cost between the same two routers so it is not unlikely that the link will be asymentric I think. However I don't think that will be a problem anymore as BIRD don't use/support equal cost multipath, right?

Yes.

...

Right, how about these statements from RFC 2328: ... These talk about inheriting ALL next hops from the parent and I don't see the BIRD does that. Looks like BIRD just selects one next hop from its parent? Assuming it is, does that impose any restricts?

Yes, BIRD chooses one next hop. Because BIRD does not implement equal cost multipath, this is not a problem.

...

Some other random questions:

One think I always wanted is the possibility to add an host IP address to all areas, in my case I like to export the routers primary IP address to all areas so the router can always be reached. Is that possible with BIRD?

You can use 'stubnet' option to add 'virtual' stub network (in that case /32 stub) to the router-LSA. You can add this option to each area config to get this in each area. Or add /32 stub network to one area, other areas get a summary-LSA with that address.

...

If you pull the cable to an ethernet I/F that is currently in a OSPF domain, do BIRD delete the whole subnet from the R-LSA or do you leave a host route with the I/F's IP address in the R-LSA?

BIRD does not use link availability information, therefore if you pull the cable, BIRD keeps the whole subnet in the router-LSA (if it is a stub network). -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

11:04 p.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/25 23:20:33:

...

On Sun, Apr 25, 2010 at 06:08:04PM +0200, Joakim Tjernlund wrote:

...
...
This is not a problem because both SPF and calc_next_hop() chooses the cheapest (full) ptp link. They both uses the same (local) metrics.

Our ptp links typically have the same cost between the same two routers so it is not unlikely that the link will be asymentric I think. However I don't think that will be a problem anymore as BIRD don't use/support equal cost multipath, right?

Yes.

...
Right, how about these statements from RFC 2328: ... These talk about inheriting ALL next hops from the parent and I don't see the BIRD does that. Looks like BIRD just selects one next hop from its parent? Assuming it is, does that impose any restricts?

Yes, BIRD chooses one next hop. Because BIRD does not implement equal cost multipath, this is not a problem.

...
Some other random questions:

One think I always wanted is the possibility to add an host IP address to all areas, in my case I like to export the routers primary IP address to all areas so the router can always be reached. Is that possible with BIRD?

You can use 'stubnet' option to add 'virtual' stub network (in that case /32 stub) to the router-LSA. You can add this option to each area config to get this in each area.

That sounds exactly as what I need, thanks.

...

Or add /32 stub network to one area, other areas get a summary-LSA with that address.

...
If you pull the cable to an ethernet I/F that is currently in a OSPF domain, do BIRD delete the whole subnet from the R-LSA or do you leave a host route with the I/F's IP address in the R-LSA?

BIRD does not use link availability information, therefore if you pull the cable, BIRD keeps the whole subnet in the router-LSA (if it is a stub network).

If you do(any plans?), my vote is to leave a host stub in the R-LSA as it may very well be possible to reach that IP address over another link. Jocke

Ondrej Zajicek

26 Apr 26 Apr

8:01 a.m.

On Mon, Apr 26, 2010 at 01:04:44AM +0200, Joakim Tjernlund wrote:

...

...
BIRD does not use link availability information, therefore if you pull the cable, BIRD keeps the whole subnet in the router-LSA (if it is a stub network).

If you do(any plans?), my vote is to leave a host stub in the R-LSA as it may very well be possible to reach that IP address over another link.

It is in TODO list. Definitely with leaving a host stub in the router-LSA. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Andrew Lemin

8:46 a.m.

Hello. There are users out there who do desperately want multipath routing support in Bird. Please don't just ignore the need for multipath. Regards, Andy. Securepoint user. -----Original Message----- From: owner-bird-users@atrey.karlin.mff.cuni.cz [mailto:owner-bird-users@atrey.karlin.mff.cuni.cz] On Behalf Of Joakim Tjernlund Sent: 26 April 2010 00:05 To: Ondrej Zajicek Cc: bird-users@trubka.network.cz Subject: Re: OSPF performance/SPF calculations Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/25 23:20:33:

...

On Sun, Apr 25, 2010 at 06:08:04PM +0200, Joakim Tjernlund wrote:

...
...
This is not a problem because both SPF and calc_next_hop() chooses the cheapest (full) ptp link. They both uses the same (local) metrics.

Our ptp links typically have the same cost between the same two routers so it is not unlikely that the link will be asymentric I think. However I don't think that will be a problem anymore as BIRD don't use/support equal cost multipath, right?

Yes.

...
Right, how about these statements from RFC 2328: ... These talk about inheriting ALL next hops from the parent and I don't see the BIRD does that. Looks like BIRD just selects one next hop from its parent? Assuming it is, does that impose any restricts?

Yes, BIRD chooses one next hop. Because BIRD does not implement equal cost multipath, this is not a problem.

...
Some other random questions:

One think I always wanted is the possibility to add an host IP address to all areas, in my case I like to export the routers primary IP address to all areas so the router can always be reached. Is that possible with BIRD?

You can use 'stubnet' option to add 'virtual' stub network (in that case /32 stub) to the router-LSA. You can add this option to each area config to get this in each area.

That sounds exactly as what I need, thanks.

...

Or add /32 stub network to one area, other areas get a summary-LSA with that address.

...
If you pull the cable to an ethernet I/F that is currently in a OSPF domain, do BIRD delete the whole subnet from the R-LSA or do you leave a host route with the I/F's IP address in the R-LSA?

BIRD does not use link availability information, therefore if you pull the cable, BIRD keeps the whole subnet in the router-LSA (if it is a stub network).

If you do(any plans?), my vote is to leave a host stub in the R-LSA as it may very well be possible to reach that IP address over another link. Jocke Monitor Computer Systems Limited Company Registration Number: NI 17805 Registered Office: 3 Pine Crest, Holywood, North Down, Northern Ireland BT18 9ED

Joakim Tjernlund

8:49 a.m.

Andrew Lemin <andrew.lemin@monitorsoft.com> wrote on 2010/04/26 10:46:36:

...

Hello. There are users out there who do desperately want multipath routing support in Bird. Please don't just ignore the need for multipath.

The SPF work to support that should not be very hard I think, but I have no idea what other parts that need work too. Jocke

Andrew Lemin

10:14 a.m.

Hello. Yes, I believe the changes should not be that difficult. For example, by simply creating two separate 'RIP protocol' instances which listen on different external interfaces, I can get bird's internal core routing table to show all the multiple routing 'options' for common remote subnets etc. I imagine the initial fix would be quite simple as it appears to just require a change to the 'kernel protocol' export filter; Instead of updating the kernel routing table by iteratively running individual 'ip route add ....' commands for each of the bird route table entries, the kernel protocol filter should read each route entry from its bird routing table, then search for any other matching routes before finally running a single 'ip route add .... nexthop ....' command which includes each of the routeing options. Also the kernel option to enable multipath should be checked. If not enabled then no change. E.g; #birdc bird>show routing 192.168.230.0/24 via 192.168.214.1 on eth2 [EDGE1RIP 11:47] (120/4) via 192.168.215.1 on eth3 [EDGE2RIP 11:47] (120/4) Both of these two route options can be added to a multipath enabled kernel with; '# ip route add 192.168.230.0/24 nexthop via 192.168.214.1 dev eth2 weight 1 nexthop via 192.168.215.1 dev eth3 weight 1'. There is also a minor bug that I believe needs looking at and is probably associated with this subject; A kernel routing table with a multipath default route causes errors in bird. Extract of 'ip route' from our Securepoint firewall; # ip route . . . 192.168.230.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.35.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.32.0/24 via 192.168.215.1 dev eth3 proto bird 10.44.50.0/24 via 192.168.215.1 dev eth3 proto bird 10.44.51.0/24 via 192.168.215.1 dev eth3 proto bird 192.168.55.0/24 via 192.168.214.1 dev eth2 proto bird 10.46.55.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.56.0/24 via 192.168.215.1 dev eth3 proto bird 62.6.40.0/24 via 192.168.214.1 dev eth2 10.36.20.0/23 via 192.168.215.1 dev eth3 proto bird 192.168.48.0/22 via 192.168.175.2 dev eth1 default nexthop via 192.168.214.1 dev eth2 weight 1 nexthop via 192.168.215.1 dev eth3 weight 2 # causes the following error; <ERR> KRT: Mysterious route with no OIF (0.0.0.0/0) I believe that Securepoint themselves have also indicated a commercial interest in contributing to the development of multipath support in BIRD. Thanks for your time. Andy. -----Original Message----- From: Joakim Tjernlund [mailto:joakim.tjernlund@transmode.se] Sent: 26 April 2010 09:50 To: Andrew Lemin Cc: bird-users@trubka.network.cz; Ondrej Zajicek Subject: RE: OSPF performance/SPF calculations Andrew Lemin <andrew.lemin@monitorsoft.com> wrote on 2010/04/26 10:46:36:

...

Hello. There are users out there who do desperately want multipath routing support in Bird. Please don't just ignore the need for multipath.

The SPF work to support that should not be very hard I think, but I have no idea what other parts that need work too. Jocke Monitor Computer Systems Limited Company Registration Number: NI 17805 Registered Office: 3 Pine Crest, Holywood, North Down, Northern Ireland BT18 9ED

Joakim Tjernlund

12:31 p.m.

Andrew Lemin <andrew.lemin@monitorsoft.com> wrote on 2010/04/26 12:14:52:

...

Hello. Yes, I believe the changes should not be that difficult.

For example, by simply creating two separate 'RIP protocol' instances which listen on different external interfaces, I can get bird's internal core routing table to show all the multiple routing 'options' for common remote subnets etc.

I imagine the initial fix would be quite simple as it appears to just require a change to the 'kernel protocol' export filter; Instead of updating the kernel routing table by iteratively running individual 'ip route add ....' commands for each of the bird route table entries, the kernel protocol filter should read each route entry from its bird routing table, then search for any other matching routes before finally running a single 'ip route add .... nexthop ....' command which includes each of the routeing options.

BIRD is using "ip route" to configure routes? I figured BIRD used netlink directly.

...

Extract of 'ip route' from our Securepoint firewall; # ip route . . . 192.168.230.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.35.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.32.0/24 via 192.168.215.1 dev eth3 proto bird 10.44.50.0/24 via 192.168.215.1 dev eth3 proto bird 10.44.51.0/24 via 192.168.215.1 dev eth3 proto bird 192.168.55.0/24 via 192.168.214.1 dev eth2 proto bird 10.46.55.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.56.0/24 via 192.168.215.1 dev eth3 proto bird 62.6.40.0/24 via 192.168.214.1 dev eth2 10.36.20.0/23 via 192.168.215.1 dev eth3 proto bird 192.168.48.0/22 via 192.168.175.2 dev eth1 default nexthop via 192.168.214.1 dev eth2 weight 1 nexthop via 192.168.215.1 dev eth3 weight 2 #

causes the following error; <ERR> KRT: Mysterious route with no OIF (0.0.0.0/0)

I can't even find that error msg in BIRD sources.

Andrew Lemin

1:32 p.m.

Hello. Yes it quite possible that bird does use netlink directly. Regardless I think the issue is still related to the fact that a multipath route needs to be added with a single statement and not by adding separate single route statements. Regarding the error. I cannot say why it is not there, it is not a problem however. I am not a developer. Thanks again. Andy. -----Original Message----- From: Joakim Tjernlund [mailto:joakim.tjernlund@transmode.se] Sent: 26 April 2010 13:32 To: Andrew Lemin Cc: bird-users@trubka.network.cz; Ondrej Zajicek Subject: RE: OSPF performance/SPF calculations Andrew Lemin <andrew.lemin@monitorsoft.com> wrote on 2010/04/26 12:14:52:

...

Hello. Yes, I believe the changes should not be that difficult.

For example, by simply creating two separate 'RIP protocol' instances which listen on different external interfaces, I can get bird's internal core routing table to show all the multiple routing 'options' for common remote subnets etc.

I imagine the initial fix would be quite simple as it appears to just require a change to the 'kernel protocol' export filter; Instead of updating the kernel routing table by iteratively running individual 'ip route add ....' commands for each of the bird route table entries, the kernel protocol filter should read each route entry from its bird routing table, then search for any other matching routes before finally running a single 'ip route add .... nexthop ....' command which includes each of the routeing options.

BIRD is using "ip route" to configure routes? I figured BIRD used netlink directly.

...

Extract of 'ip route' from our Securepoint firewall; # ip route . . . 192.168.230.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.35.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.32.0/24 via 192.168.215.1 dev eth3 proto bird 10.44.50.0/24 via 192.168.215.1 dev eth3 proto bird 10.44.51.0/24 via 192.168.215.1 dev eth3 proto bird 192.168.55.0/24 via 192.168.214.1 dev eth2 proto bird 10.46.55.0/24 via 192.168.214.1 dev eth2 proto bird 192.168.56.0/24 via 192.168.215.1 dev eth3 proto bird 62.6.40.0/24 via 192.168.214.1 dev eth2 10.36.20.0/23 via 192.168.215.1 dev eth3 proto bird 192.168.48.0/22 via 192.168.175.2 dev eth1 default nexthop via 192.168.214.1 dev eth2 weight 1 nexthop via 192.168.215.1 dev eth3 weight 2 #

causes the following error; <ERR> KRT: Mysterious route with no OIF (0.0.0.0/0)

I can't even find that error msg in BIRD sources. Monitor Computer Systems Limited Company Registration Number: NI 17805 Registered Office: 3 Pine Crest, Holywood, North Down, Northern Ireland BT18 9ED

Joakim Tjernlund

23 Apr 23 Apr

9:37 a.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 10:32:06:

...

On Wed, Apr 21, 2010 at 10:04:06PM +0200, Joakim Tjernlund wrote:

...
Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/21 20:15:07:

...
On Wed, Apr 21, 2010 at 09:41:47AM +0200, Joakim Tjernlund wrote:

...
I am using Quagga ATM but I had a quick look at BIRD and I got a few observations.

Hello. Thank you for your tips and notes.

...
The LSA/checksum code seem very inefficient. LSAs are built allocating/reallocing bits of memory. This is slow and will case memory fragmentation.

You mean lsab_alloc() in originate_rt_lsa_body()? This allocation is just a sequential allocation in a persistent memory buffer, therefore it is very efficient (in most cases just increase of lsab_used counter) and there si no memory fragmentation (all is done inside a persistent memory buffer).

Yes, you do realloc on small amounts of memory. Also, receives LSAs seems to be impl. differently so you need handle these somewhat differently.

realloc() is used only for a persistent buffer to grow it sufficiently large. So it is called only small number of times during the run of the program and not in subsequent LSA originations. Therefore it is not an issue.

It would be better if you used the same (linear space) as you do for received LSA's. I guess the received LSA's has a MAX size? Then you could allocate space as big as the MAX received LSAs size, add data to it and, if you want to, realloc the the whole LSA to a smaller size. Jocke

Ondrej Zajicek

10:01 a.m.

On Fri, Apr 23, 2010 at 11:37:56AM +0200, Joakim Tjernlund wrote:

...

...
realloc() is used only for a persistent buffer to grow it sufficiently large. So it is called only small number of times during the run of the program and not in subsequent LSA originations. Therefore it is not an issue.

It would be better if you used the same (linear space) as you do for received LSA's. I guess the received LSA's has a MAX size? Then you could allocate space as big as the MAX received LSAs size, add data to it and, if you want to, realloc the the whole LSA to a smaller size.

Received LSAs are copied from RX buffer to the space allocated for them. Originated LSAs are prepared in the persistent buffer and then copied to the space allocated for them. In both cases the final space for the LSA is allocated once (not reallocated) for the final size of the LSA. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

10:44 a.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 12:01:29:

...

On Fri, Apr 23, 2010 at 11:37:56AM +0200, Joakim Tjernlund wrote:

...
...
realloc() is used only for a persistent buffer to grow it sufficiently large. So it is called only small number of times during the run of the program and not in subsequent LSA originations. Therefore it is not an issue.

It would be better if you used the same (linear space) as you do for received LSA's. I guess the received LSA's has a MAX size? Then you could allocate space as big as the MAX received LSAs size, add data to it and, if you want to, realloc the the whole LSA to a smaller size.

Received LSAs are copied from RX buffer to the space allocated for them. Originated LSAs are prepared in the persistent buffer and then copied to the space allocated for them. In both cases the final space for the LSA is allocated once (not reallocated) for the final size of the LSA.

I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header? Jocke

Joakim Tjernlund

11:06 a.m.

...

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 12:01:29:

...
On Fri, Apr 23, 2010 at 11:37:56AM +0200, Joakim Tjernlund wrote:

...
...
realloc() is used only for a persistent buffer to grow it sufficiently large. So it is called only small number of times during the run of the program and not in subsequent LSA originations. Therefore it is not an issue.

It would be better if you used the same (linear space) as you do for received LSA's. I guess the received LSA's has a MAX size? Then you could allocate space as big as the MAX received LSAs size, add data to it and, if you want to, realloc the the whole LSA to a smaller size.

Received LSAs are copied from RX buffer to the space allocated for them. Originated LSAs are prepared in the persistent buffer and then copied to the space allocated for them. In both cases the final space for the LSA is allocated once (not reallocated) for the final size of the LSA.

I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header?

Ahh, I am starting to get a clue I think. It is the struct top_hash_entry that has this separation of LSA header and body. Would it be feasible to move struct ospf_lsa_header lsa into void *lsa_body, that is, merge them into one so there is just one struct ospf_lsa_header *lsa instead? Jocke

Ondrej Zajicek

12:22 p.m.

On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote:

...

...
I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header?

Ahh, I am starting to get a clue I think. It is the struct top_hash_entry that has this separation of LSA header and body. Would it be feasible to move struct ospf_lsa_header lsa into void *lsa_body, that is, merge them into one so there is just one struct ospf_lsa_header *lsa instead?

Yes, LSA header and LSA body are separated and i am not sure what is a purpose of that separation, but it does not cause much problems, so it is probably pointless to change this. It probably makes slightly faster access to the header fields. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Joakim Tjernlund

1:09 p.m.

Ondrej Zajicek <santiago@crfreenet.org> wrote on 2010/04/23 14:22:18:

...

On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote:

...
...
I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header?

Ahh, I am starting to get a clue I think. It is the struct top_hash_entry that has this separation of LSA header and body. Would it be feasible to move struct ospf_lsa_header lsa into void *lsa_body, that is, merge them into one so there is just one struct ospf_lsa_header *lsa instead?

Yes, LSA header and LSA body are separated and i am not sure what is a purpose of that separation, but it does not cause much problems, so it is probably pointless to change this. It probably makes slightly faster access to the header fields.

But it slows down the fletcher checksum as one need to test and extra calculations because of this. Any gain by the separation is lost many times over in the fletcher checksum which could be as simple as(from Quagga with my tweaks): .... while (left != 0) { partial_len = MIN(left, MODX); left -= partial_len; do { c0 = c0 + *(++p); c1 += c0; } while (--partial_len); c0 = c0 % 255; c1 = c1 % 255; } /* The cast is important, to ensure the mod is taken as a signed value. */ x = ((int)(len - offset - 1) * c0 - c1) % 255; if (x <= 0) x += 255; y = 510 - c0 - x; if (y > 255) y -= 255; .... Jocke

Martin Mares

1:22 p.m.

Hello!

...

But it slows down the fletcher checksum as one need to test and extra calculations because of this. Any gain by the separation is lost many times over in the fletcher checksum which could be as simple as(from Quagga with my tweaks):

Again, do you have any real numbers backing this claim? Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth "It is easier to port a shell than a shell script." -- Larry Wall

Joakim Tjernlund

1:33 p.m.

Martin Mares <mj@ucw.cz> wrote on 2010/04/23 15:22:28:

...

Hello!

...
But it slows down the fletcher checksum as one need to test and extra calculations because of this. Any gain by the separation is lost many times over in the fletcher checksum which could be as simple as(from Quagga with my tweaks):

Again, do you have any real numbers backing this claim?

Nope, just common sense. I am not using BIRD yet, just trying to figure out what it can do and not and reporting stuff as I go along. One doesn't need to be Einstein to realise that the current code is slower and bigger(harder to read too) than the one I posted. Jocke

Ondrej Filip

5:05 p.m.

On 23.4.2010 14:22, Ondrej Zajicek wrote:

...

On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote:

...
...
I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header?

Ahh, I am starting to get a clue I think. It is the struct top_hash_entry that has this separation of LSA header and body. Would it be feasible to move struct ospf_lsa_header lsa into void *lsa_body, that is, merge them into one so there is just one struct ospf_lsa_header *lsa instead?

Yes, LSA header and LSA body are separated and i am not sure what is a purpose of that separation, but it does not cause much problems, so it is probably pointless to change this. It probably makes slightly faster access to the header fields.

Hmm, I wrote this part about 8 years ago. But I believe I had many reasons for that. Size of LSA body differs, but size of LSA header is fixed. Also LSA body is kept in network endianity while LSA header is in CPU endianity. I know the function lsasum_calculate is unefficient - I noted this in the related comment. But I have never felt this has been a serious issues. I believe we have some other more important things to do. I believe Martin's idea with to check BIRD with OProfile is the key one. Otherwise we would work on fixing some minor issues. Ondrej

...

Joakim Tjernlund

5:24 p.m.

Ondrej Filip <feela@network.cz> wrote on 2010/04/23 19:05:57:

...

On 23.4.2010 14:22, Ondrej Zajicek wrote:

...
On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote:

...
...
I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header?

Ahh, I am starting to get a clue I think. It is the struct top_hash_entry that has this separation of LSA header and body. Would it be feasible to move struct ospf_lsa_header lsa into void *lsa_body, that is, merge them into one so there is just one struct ospf_lsa_header *lsa instead?

Yes, LSA header and LSA body are separated and i am not sure what is a purpose of that separation, but it does not cause much problems, so it is probably pointless to change this. It probably makes slightly faster access to the header fields.

Hmm, I wrote this part about 8 years ago. But I believe I had many reasons for that. Size of LSA body differs, but size of LSA header is fixed. Also LSA body is kept in network endianity while LSA header is in CPU endianity.

hmm, don't think the body is in network order. Look at the lsasum_calculate, it swaps the whole LSA, twice. Then you must also swap the LSA once more before transmitting it.

...

I know the function lsasum_calculate is unefficient - I noted this in the related comment. But I have never felt this has been a serious issues. I believe we have some other more important things to do.

I believe Martin's idea with to check BIRD with OProfile is the key one. Otherwise we would work on fixing some minor issues.

Ok, but could at least make this endian stuff a NOP on BE CPUs?

Ondrej Filip

5:36 p.m.

On 23.4.2010 19:24, Joakim Tjernlund wrote:

...

Ondrej Filip <feela@network.cz> wrote on 2010/04/23 19:05:57:

...
On 23.4.2010 14:22, Ondrej Zajicek wrote:

...
On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote:

...
...
I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header?

Ahh, I am starting to get a clue I think. It is the struct top_hash_entry that has this separation of LSA header and body. Would it be feasible to move struct ospf_lsa_header lsa into void *lsa_body, that is, merge them into one so there is just one struct ospf_lsa_header *lsa instead?

Yes, LSA header and LSA body are separated and i am not sure what is a purpose of that separation, but it does not cause much problems, so it is probably pointless to change this. It probably makes slightly faster access to the header fields.

Hmm, I wrote this part about 8 years ago. But I believe I had many reasons for that. Size of LSA body differs, but size of LSA header is fixed. Also LSA body is kept in network endianity while LSA header is in CPU endianity.

hmm, don't think the body is in network order. Look at the lsasum_calculate, it swaps the whole LSA, twice. Then you must also swap the LSA once more before transmitting it.

True, I checked it now. So maybe that was just an original idea. And this is still in some functions like lsa_flood etc. Anyway.

...

...
I know the function lsasum_calculate is unefficient - I noted this in the related comment. But I have never felt this has been a serious issues. I believe we have some other more important things to do.

I believe Martin's idea with to check BIRD with OProfile is the key one. Otherwise we would work on fixing some minor issues.

Ok, but could at least make this endian stuff a NOP on BE CPUs?

I am open to anything that helps. Ondrej

...

5936

Age (days ago)

5941

Last active (days ago)

List overview

Download

30 comments

5 participants

participants (5)

Andrew Lemin
Joakim Tjernlund
Martin Mares
Ondrej Filip
Ondrej Zajicek