bgp soft reconfiguration inbound

Wolfgang Hennerbichler

20 Nov 2009 20 Nov '09

1:19 p.m.

Fellow Bird Users, I thought our bird installation was ready for prime-time as one thing caught my eye which I don't like at all, but hopefully you can help me out - we do bgp connections like this (only necessary stuff displayed): protocol bgp R1120x193_203_0_25 { local as myas; neighbor 193.203.0.25 as 1120; password "xyz"; import filter bgp_in_AS1120x193_203_0_25; export all; table T1120x193_203_0_25; rs client; start delay time 55; } filter bgp_in_AS1120x193_203_0_25 prefix set allnet; { # ... allnet = [ 193.171.255.32/27, 193.171.255.64/27, 192.92.125.0/24, 194.0.0.0/24 ]; if ! (net ~ allnet) then reject; accept; } Now I realized that if I changed the definition for 'allnet' and do a 'configure soft' these routes that are added or removed to 'allnet' are not rejected or added, because it seems that there is no bgp soft reconfiguration taking place (it works if I trigger a soft reconfiguration outbound on the peer router, or if I restart the protocol which causes bgp to flap). I know that nix.cz doesn't need this, as filtering is done in the pipe protocol, but we would really really like to filter at the very border, and if the filters change, I think a bgp soft reconfiguration inbound would be great. Is there a way to do a bgp soft reconfiguration, so that the routes are re-transmitted / re-calculated? Thanks for any help and have a nice weekend; Wolfgang -- www.vix.at | www.aco.net wh@univie.ac.at | WH844-RIPE Vienna University Computer Center

Show replies by date

Ondrej Zajicek

20 Nov 20 Nov

3:17 p.m.

On Fri, Nov 20, 2009 at 02:19:05PM +0100, Wolfgang Hennerbichler wrote:

...

Fellow Bird Users,

Now I realized that if I changed the definition for 'allnet' and do a 'configure soft' these routes that are added or removed to 'allnet' are not rejected or added, because it seems that there is no bgp soft reconfiguration taking place (it works if I trigger a soft reconfiguration outbound on the peer router, or if I restart the protocol which causes bgp to flap).

BGP soft reconfiguration is not implemented. 'configure soft' just means 'ignore changes in filters'. So you have to restart the protocol or do the filtering in the pipe protocol, like in nix.cz. BTW, why do you want to filter routes in BGP protocols and not in pipes? I see just one problem with pipe setting - increase in memory consumption if peers would send too many unwanted routes. But this might be mitigated by 'route limit' option. I probably look at the soft reconfiguration in near future, but i am not sure how problematic would be to implement it. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Wolfgang Hennerbichler

23 Nov 23 Nov

7:19 a.m.

On Nov 20, 2009, at 16:17 , Ondrej Zajicek wrote:

...

On Fri, Nov 20, 2009 at 02:19:05PM +0100, Wolfgang Hennerbichler wrote:

...
Fellow Bird Users,

Now I realized that if I changed the definition for 'allnet' and do a 'configure soft' these routes that are added or removed to 'allnet' are not rejected or added, because it seems that there is no bgp soft reconfiguration taking place (it works if I trigger a soft reconfiguration outbound on the peer router, or if I restart the protocol which causes bgp to flap).

BGP soft reconfiguration is not implemented. 'configure soft' just means 'ignore changes in filters'. So you have to restart the protocol or do the filtering in the pipe protocol, like in nix.cz.

hm. I feared this, and I have to say this is not optimal.

...

BTW, why do you want to filter routes in BGP protocols and not in pipes? I see just one problem with pipe setting - increase in memory consumption if peers would send too many unwanted routes. But this might be mitigated by 'route limit' option.

well, this is because our concept differs from the one of nic.cz. We filter at the border, and build pipes to every neighbor who has decided to peer. See this illustration I've made for the last euro-ix: http://tiny.wogri.at/PP (red is the filters, the arcs are the pipes). This means we have no "main" rib. This is good in certain ways: 1) we have a web-interface for setting up the route-server - every peering coordinator can decide who to peer with by simply clicking around like crazy 2) we don't need to fiddle with bgp communities which pose a big question mark once 4 byte AS-Numbers are here 3) ISP's who have less understanding of the subject just need to peer with the route-servers and click around 4) we save memory by just exporting routes to the ribs that need it. there are a couple of more advantages to that concept, the main advantage, and the point that everybody likes is the webinterface and the super-simple setup at client-side. So one way I see to solve this problem is to build a second table per peer and apply the filter to a pipe that transports routes from one table to the other, but to be honest, I refuse to do that, this would be a bad and memory-wasting concept. The other workaround would be to bring the protocol down and up again, and rely on the second route-server to stay up and running. This puts unnecessary load on all peering devices and I think this is not great either. The 'route limit' thing is something I am not too fond of either, because I never know in advance how many routes a peer will announce, and this needs constant manual attention.

...

I probably look at the soft reconfiguration in near future, but i am not sure how problematic would be to implement it.

I have no idea how complicated this would be, but to be honest if you want bird to compete with other route-servers - and to make me and a whole neighbouring-country including most of the ISP's in that country happy :) - I think you really should consider looking into it. Also given that other Internet Exchanges are considering BIRD as a good alternative I think your effort would soon turn out to be valuable not only to a small country like Austria, but you are on the way to conquer the whole world (well, that's a little exaggerated, but it gives you the idea :))! I would really appreciate if you could look into that, and be sure, you're not only doing this for us, many people will use this soon. I am willing to help you testing as good as I can. Let me know what you think! Wolfgang

...

-- Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

-- www.vix.at | www.aco.net wh@univie.ac.at | WH844-RIPE Vienna University Computer Center

Wolfgang Hennerbichler

11:50 a.m.

On Nov 23, 2009, at 08:19 , Wolfgang Hennerbichler wrote:

...

On Nov 20, 2009, at 16:17 , Ondrej Zajicek wrote:

...
On Fri, Nov 20, 2009 at 02:19:05PM +0100, Wolfgang Hennerbichler wrote:

...
Fellow Bird Users,

Now I realized that if I changed the definition for 'allnet' and do a 'configure soft' these routes that are added or removed to 'allnet' are not rejected or added, because it seems that there is no bgp soft reconfiguration taking place (it works if I trigger a soft reconfiguration outbound on the peer router, or if I restart the protocol which causes bgp to flap).

BGP soft reconfiguration is not implemented. 'configure soft' just means 'ignore changes in filters'.

because my co-worker gave me a hint, that it might not be totally clear what I'm asking for: all I am asking for is to send the bgp neighbor a bgp message and tell him he should re-send his routes (cisco-syntax: clear bgp 1.2.3.4 soft in), not a separate routing-table where I can see the received-routes (this is not necessary in our case and I understand that this might be hard to implement). Wolfgang -- www.vix.at | www.aco.net wh@univie.ac.at | WH844-RIPE Vienna University Computer Center

Anton

12:41 p.m.

New subject: Pleace help me to unsubscribe.

Good Day. Sorry for this stupid question. I can not to unsubscribe from this list. I tried to send letter to majordomo@network.cz with "unsubscribe bird-users" text in the body and in the subject but no results. How to unsubscribe bird-users ? My give up :) -- Anton [WARM-RIPE] Stack ltd division head tel. 8 (3822) 555-797

Ondrej Zajicek

5:09 p.m.

On Mon, Nov 23, 2009 at 12:50:19PM +0100, Wolfgang Hennerbichler wrote:

...

On Nov 23, 2009, at 08:19 , Wolfgang Hennerbichler wrote:

...
On Nov 20, 2009, at 16:17 , Ondrej Zajicek wrote:

...
On Fri, Nov 20, 2009 at 02:19:05PM +0100, Wolfgang Hennerbichler wrote:

...
Fellow Bird Users,

Now I realized that if I changed the definition for 'allnet' and do a 'configure soft' these routes that are added or removed to 'allnet' are not rejected or added, because it seems that there is no bgp soft reconfiguration taking place (it works if I trigger a soft reconfiguration outbound on the peer router, or if I restart the protocol which causes bgp to flap).

BGP soft reconfiguration is not implemented. 'configure soft' just means 'ignore changes in filters'.

because my co-worker gave me a hint, that it might not be totally clear what I'm asking for:

It was clear for me. I checked route refresh specification and it seems to be pretty easy to implement (at least 'the requesting side' of it), therefore i hope we can implement it soon. BTW, what kind of user interface would be useful for you? - per protocol request for route refresh - something like 'configure' but with route refresh instead of protocol restart? it should ignore, report or restart peers that does not support router refres? -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Wolfgang Hennerbichler

24 Nov 24 Nov

3:03 p.m.

On Mo, 23.11.2009, 18:09, Ondrej Zajicek wrote:

...

It was clear for me. I checked route refresh specification and it seems to be pretty easy to implement (at least 'the requesting side' of it), therefore i hope we can implement it soon.

wooohoooo, great news, thank you so much!

...

BTW, what kind of user interface would be useful for you?

- per protocol request for route refresh

hm. I'm not exactly sure how you mean this - something like: bird> restart protocol R1234 soft

...

- something like 'configure' but with route refresh instead of protocol restart? it should ignore, report or restart peers that does not support router refres?

like bird> configure route-refresh => i'd prefer the per protocol thing (restart protocl R1234 soft). would be user-friendly and very scalable. if the peer doesn't support route refresh it should notify about it, IMHO. My script could then react and bring the protocol down and up again, on the other hand, depending on how much time you want to spend, you could make that an option in the config-file: protocl bgp R1234 { ... clear soft action [restart|report|ignore] } I'd be happy with a simple notification, though. Thank you so much. Wolfgang

...

-- Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Andy Davidson

3:12 p.m.

Wolfgang Hennerbichler wrote:

...

On Mo, 23.11.2009, 18:09, Ondrej Zajicek wrote:

...
It was clear for me. I checked route refresh specification and it seems to be pretty easy to implement (at least 'the requesting side' of it), therefore i hope we can implement it soon. wooohoooo, great news, thank you so much!

Agreed. :-) My requirement for this - both at the ISP level and the IXP route server level is : - I can using my own script, rebuild filters every day, against an RPSL database. - I want, without resetting sessions, to replay the prefixes available over a session and test them against the new session. On Cisco/Juniper, I do this exactly as Wolfgang describes, i.e. change the filter config and then clear soft the session. The router then makes a route-refresh call to the peer, has the peer replay the prefixes, and my new filters become enforced. Thanks Andy

Ondrej Zajicek

3:13 p.m.

On Tue, Nov 24, 2009 at 04:03:32PM +0100, Wolfgang Hennerbichler wrote:

...

...
- per protocol request for route refresh

hm. I'm not exactly sure how you mean this - something like:

bird> restart protocol R1234 soft

Yes.

...

like bird> configure route-refresh

=> i'd prefer the per protocol thing (restart protocl R1234 soft). would be user-friendly and very scalable. if the peer doesn't support route refresh it should notify about it, IMHO.

Thank you for your notes. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Wolfgang Hennerbichler

3:25 p.m.

On Di, 24.11.2009, 16:13, Ondrej Zajicek wrote:

...

Thank you for your notes.

Thank YOU for your help.

...

-- Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Ondrej Zajicek

23 Nov 23 Nov

5:23 p.m.

On Mon, Nov 23, 2009 at 08:19:43AM +0100, Wolfgang Hennerbichler wrote:

...

well, this is because our concept differs from the one of nic.cz. We filter at the border, and build pipes to every neighbor who has decided to peer. See this illustration I've made for the last euro-ix: http://tiny.wogri.at/PP (red is the filters, the arcs are the pipes). This means we have no "main" rib. This is good in certain ways:

Interesting concept. BTW, you should make sure that filters on the pipes are set such that there is no loop for each route. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Wolfgang Hennerbichler

9:43 p.m.

On Nov 23, 2009, at 18:23 , Ondrej Zajicek wrote:

...

On Mon, Nov 23, 2009 at 08:19:43AM +0100, Wolfgang Hennerbichler wrote:

...
well, this is because our concept differs from the one of nic.cz. We filter at the border, and build pipes to every neighbor who has decided to peer. See this illustration I've made for the last euro-ix: http://tiny.wogri.at/PP (red is the filters, the arcs are the pipes). This means we have no "main" rib. This is good in certain ways:

Interesting concept. BTW, you should make sure that filters on the pipes are set such that there is no loop for each route.

thanks. there are filters in the pipes in place, that limit by the 'from' ip-address.

...

-- Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

-- www.vix.at | www.aco.net wh@univie.ac.at | WH844-RIPE Vienna University Computer Center

Wolfgang Hennerbichler

30 Nov 30 Nov

3:40 p.m.

On Nov 23, 2009, at 18:23 , Ondrej Zajicek wrote:

...

On Mon, Nov 23, 2009 at 08:19:43AM +0100, Wolfgang Hennerbichler wrote:

...
well, this is because our concept differs from the one of nic.cz. We filter at the border, and build pipes to every neighbor who has decided to peer. See this illustration I've made for the last euro-ix: http://tiny.wogri.at/PP (red is the filters, the arcs are the pipes). This means we have no "main" rib. This is good in certain ways:

Interesting concept. BTW, you should make sure that filters on the pipes are set such that there is no loop for each route.

hm. it seems that I do - sometimes - get loops. Nov 30 14:00:30 rs1 bird: Pipe loop detected when sending 84.205.69.0/24 to table T8596x130 ... this only happens very rarely, maybe during configure soft or a bgp update. nevertheless I don't quite get it, because I do have filters in place which should avoid that. All my pipes look like this: protocol pipe P30971x30x15 { table T30971x30; mode transparent; peer table T5403x15; import filter { reject; }; export filter { if from = 193.203.0.30 then { accept; } else reject; }; } Shouldn't the from be enough? Should I add a if ! source = RTS_BGP reject; ? Do the pipes ignore the filters at any time? thanks, Wolfgang PS: We've got BIRD running at VIX in Beta now, about 12 participants, no crashes, no problems at all (except for the loop notices)

...

-- Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

-- www.vix.at | www.aco.net wh@univie.ac.at | WH844-RIPE Vienna University Computer Center

Ondrej Zajicek

8:18 p.m.

On Mon, Nov 30, 2009 at 04:40:23PM +0100, Wolfgang Hennerbichler wrote:

...

On Nov 23, 2009, at 18:23 , Ondrej Zajicek wrote:

...
On Mon, Nov 23, 2009 at 08:19:43AM +0100, Wolfgang Hennerbichler wrote:

...
well, this is because our concept differs from the one of nic.cz. We filter at the border, and build pipes to every neighbor who has decided to peer. See this illustration I've made for the last euro-ix: http://tiny.wogri.at/PP (red is the filters, the arcs are the pipes). This means we have no "main" rib. This is good in certain ways:

Interesting concept. BTW, you should make sure that filters on the pipes are set such that there is no loop for each route.

hm. it seems that I do - sometimes - get loops.

Nov 30 14:00:30 rs1 bird: Pipe loop detected when sending 84.205.69.0/24 to table T8596x130 ... this only happens very rarely, maybe during configure soft or a bgp update. nevertheless I don't quite get it, because I do have filters in place which should avoid that. All my pipes look like this:

I checked the source for the loop check and it seems that there are some problems in that code. First, it is not very consistent - in one direction the loop check is applied after the filter and in other directon it is applied before the filter. Therefore it is possible that sometimes the loop check is triggered even if the filter would also reject the route. This is annoying but otherwise harmless, i hope. Second, in some situations the loop check does not work and that causes that the first problem manifests less often :-).

...

protocol pipe P30971x30x15 { table T30971x30; mode transparent; peer table T5403x15; import filter { reject; }; export filter { if from = 193.203.0.30 then { accept; } else reject; }; }

All your pipes do reject in the import filter and relevant test in the export filter? Every route passes through at most one pipe according to your expected filter behavior?

...

Do the pipes ignore the filters at any time?

I think filters are not ignored.

...

PS: We've got BIRD running at VIX in Beta now, about 12 participants, no crashes, no problems at all (except for the loop notices)

That is nice. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Wolfgang Hennerbichler

1 Dec 1 Dec

9:48 a.m.

On Nov 30, 2009, at 21:18 , Ondrej Zajicek wrote:

...

...
...

I checked the source for the loop check and it seems that there are some problems in that code.

ok.

...

Second, in some situations the loop check does not work and that causes that the first problem manifests less often :-).

allright :)

...

...
protocol pipe P30971x30x15 { table T30971x30; mode transparent; peer table T5403x15; import filter { reject; }; export filter { if from = 193.203.0.30 then { accept; } else reject; }; }

All your pipes do reject in the import filter and relevant test in the export filter?

all my pipes look exactly like this, without any exceptions.

...

Every route passes through at most one pipe according to your expected filter behavior?

well, a route can be distributed from one table to let's say 10 other tables by 10 pipes. you remember we do things a little different. but the pipes are set like above. so there can't be a triangle or something that sends routes from table A, to table B, to table C and back to A. That shouldn't be possible.

...

...
Do the pipes ignore the filters at any time?

I think filters are not ignored.

ok. I think it's the loop prevention thing that's in place before the filter which causes this, this is very certain. I also checked quite some routes where bird thinks they're looping, and all of them are where they should be. Are there chances you'll fix the loop checking code so that it at least checks always after filtering?

...

...
PS: We've got BIRD running at VIX in Beta now, about 12 participants, no crashes, no problems at all (except for the loop notices)

That is nice.

I think so, too :)

...

-- Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

-- www.vix.at | www.aco.net wh@univie.ac.at | WH844-RIPE Vienna University Computer Center

Ondrej Zajicek

12:09 p.m.

On Tue, Dec 01, 2009 at 10:48:56AM +0100, Wolfgang Hennerbichler wrote:

...

...
Every route passes through at most one pipe according to your expected filter behavior?

well, a route can be distributed from one table to let's say 10 other tables by 10 pipes. you remember we do things a little different. but the pipes are set like above. so there can't be a triangle or something that sends routes from table A, to table B, to table C and back to A. That shouldn't be possible.

My question is whether it should be possible taht route was sent from table A through table B to table C (where A != C). According to what you wrote about pipe filters it shouldn't be possible, unless you have someting like two tables associated with one IP address.

...

...
...
Do the pipes ignore the filters at any time?

I think filters are not ignored.

ok. I think it's the loop prevention thing that's in place before the filter which causes this, this is very certain. I also checked quite some routes where bird thinks they're looping, and all of them are where they should be. Are there chances you'll fix the loop checking code so that it at least checks always after filtering?

Your problem is probably related to the problem mentioned by me (loop check before filter) but according to my understanding of the code, it shouldn't happen in your case. It might happen only when route was successfully sent from A through B to C (or through a longer chain). Could you enable logging of routes ('debug protocols { routes }') and send me at least appropriate part (messages related to route triggered loop check) of a log?

...

...
...
PS: We've got BIRD running at VIX in Beta now, about 12 participants, no crashes, no problems at all (except for the loop notices)

BTW, we have implemented route refresh, it will be in the next release. But route refresh solves just changes in import filters. Do you also need similar kind of 'soft reconfiguration' for export filters? -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Wolfgang Hennerbichler

12:21 p.m.

On Dec 1, 2009, at 13:09 , Ondrej Zajicek wrote:

...

On Tue, Dec 01, 2009 at 10:48:56AM +0100, Wolfgang Hennerbichler wrote:

...
...
Every route passes through at most one pipe according to your expected filter behavior?

well, a route can be distributed from one table to let's say 10 other tables by 10 pipes. you remember we do things a little different. but the pipes are set like above. so there can't be a triangle or something that sends routes from table A, to table B, to table C and back to A. That shouldn't be possible.

My question is whether it should be possible taht route was sent from table A through table B to table C (where A != C). According to what you wrote about pipe filters it shouldn't be possible, unless you have someting like two tables associated with one IP address.

no, it should actually not be possible. I'll look into that deeper though.

...

...
...
...
Do the pipes ignore the filters at any time?

I think filters are not ignored.

ok. I think it's the loop prevention thing that's in place before the filter which causes this, this is very certain. I also checked quite some routes where bird thinks they're looping, and all of them are where they should be. Are there chances you'll fix the loop checking code so that it at least checks always after filtering?

Your problem is probably related to the problem mentioned by me (loop check before filter) but according to my understanding of the code, it shouldn't happen in your case. It might happen only when route was successfully sent from A through B to C (or through a longer chain).

as stated above, shouldn't happen.

...

Could you enable logging of routes ('debug protocols { routes }') and send me at least appropriate part (messages related to route triggered loop check) of a log?

ok. I'm in the process of setting up a test-server right now to figure things out.

...

...
...
...
PS: We've got BIRD running at VIX in Beta now, about 12 participants, no crashes, no problems at all (except for the loop notices)

BTW, we have implemented route refresh, it will be in the next release.

great news, thanks!

...

But route refresh solves just changes in import filters.

good.

...

Do you also need similar kind of 'soft reconfiguration' for export filters?

I don't need it currently, but if it's easy to implement it would make bird a little more feature rich, which will never be a bad idea, and of course it's nice to have that feature - I see me asking for it in ~6 months, so if you can easily do it now, go for it :) Wolfgang

...

-- Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

-- www.vix.at | www.aco.net wh@univie.ac.at | WH844-RIPE Vienna University Computer Center

Vonlanthen, Elmar

3 Dec 3 Dec

7:22 a.m.

New subject: Neighbor state remains in exchange or loading

Hello all Could someone explain me, what is the reason a neighbor state remains in exchange or loading? On the neighbor himself, the state is in full. If I dump the traffic, I see these LS packets again and again: 08:11:42.636071 IP 10.254.1.10 > 10.254.10.1: OSPFv2, LS-Request, length: 36 08:11:42.637017 IP 10.254.10.1 > 10.254.1.10: OSPFv2, LS-Update, length: 244 08:11:42.640274 IP 10.254.1.10 > 10.254.10.1: OSPFv2, LS-Ack, length: 44 Where can I search for reason? Is the request or the update packet wrong? Or could it be a bird internal problem? I am using bird 1.1.6. Normally everything is working fine. The problem happens sometimes after bird restart. If necessary I could provide more information about the whole configuration. Thanks für any explanation. Best regards Elmar

Ondrej Zajicek

4 Dec 4 Dec

3:38 p.m.

New subject: Neighbor state remains in exchange or loading

On Thu, Dec 03, 2009 at 08:22:34AM +0100, Vonlanthen, Elmar wrote:

...

Hello all

Could someone explain me, what is the reason a neighbor state remains in exchange or loading? On the neighbor himself, the state is in full.

If I dump the traffic, I see these LS packets again and again: 08:11:42.636071 IP 10.254.1.10 > 10.254.10.1: OSPFv2, LS-Request, length: 36 08:11:42.637017 IP 10.254.10.1 > 10.254.1.10: OSPFv2, LS-Update, length: 244 08:11:42.640274 IP 10.254.1.10 > 10.254.10.1: OSPFv2, LS-Ack, length: 44

10.254.1.10 is the side that reports a neighbor state in exchange or loading? Switching from loading to full should be done when side empties its LS-request list (lsrql neighbor field). So there are three basic possibilities for problems: 1) LS-request list contains some garbage. 2) the other side sent just some subset in LS-Update. 3) the other side sent all request LSAs in LS-Update, but some are thrown away in LSA update processing. Perhaps the best thing would be to use debugging dumps in ospf_lsreq_send() (lsreq.c:66, DBG("Requesting ...) and in ospf_lsupd_receive() (lsupd.c:413, DBG("Update Type: ...) to see what was requested and what was received. One way to not lost in tons of debug messages it to not enable it, just replace DBG("xxx", yyy) with log(L_WARN "xxx", yyy) for interesting debug messages. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Vonlanthen, Elmar

9 Dec 9 Dec

9:49 a.m.

New subject: Neighbor state remains in exchange or loading

Hello Ondrej Thanks for your help.

...

...
If I dump the traffic, I see these LS packets again and again: 08:11:42.636071 IP 10.254.1.10 > 10.254.10.1: OSPFv2, LS-Request, length: 36 08:11:42.637017 IP 10.254.10.1 > 10.254.1.10: OSPFv2, LS-Update, length: 244 08:11:42.640274 IP 10.254.1.10 > 10.254.10.1: OSPFv2, LS-Ack, length: 44

10.254.1.10 is the side that reports a neighbor state in exchange or loading?

Yes, that is right.

...

Perhaps the best thing would be to use debugging dumps in ospf_lsreq_send() (lsreq.c:66, DBG("Requesting ...) and in ospf_lsupd_receive() (lsupd.c:413, DBG("Update Type: ...) to see what was requested and what was received.

One way to not lost in tons of debug messages it to not enable it, just replace DBG("xxx", yyy) with log(L_WARN "xxx", yyy) for interesting debug messages.

Here I have the problem again. One neighbor state remains in state "loading". I activated some debug messages. 2009-12-09 10:23:54 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:23:54 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 289, Sum: 19615 2009-12-09 10:23:59 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:23:59 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 293, Sum: 19615 2009-12-09 10:24:04 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:04 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 299, Sum: 19615 2009-12-09 10:24:09 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:09 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 303, Sum: 19615 2009-12-09 10:24:14 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:14 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 309, Sum: 19615 2009-12-09 10:24:19 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:19 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 313, Sum: 19615 2009-12-09 10:24:24 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:24 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 319, Sum: 19615 2009-12-09 10:24:29 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:29 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 323, Sum: 19615 2009-12-09 10:24:34 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:34 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 327, Sum: 19615 2009-12-09 10:24:39 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:24:39 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 333, Sum: 19615 2009-12-09 10:24:44 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 As you can see, the sequence numbers don't match. Is that the problem? Der Router ID 172.16.101.1 is the LAN ip address of host "chgut1fw01". The neighbor state, which is not ok is the one of the interface "gABchgut5". See below for explaination. The setup is a little bit complicated. And it doesn't always happen with the same neighbor. This is our setup: We have some gateways with ipsec transport connections to each other. If a gateway has two WAN links, it has connections over both links (called linkA and linkB). So between two doublelink-gateways I have four connections (A-A, A-B, B-A, B-B). Over these transport connections we have GRE-tunnels and trough the GRE-tunnels we use OSPF routing. A gateway can have multiple OSPF interfaces to another gateway. In the attachment you will find the bird.conf on gateway 1. The interfaces "g[AB][AB]<name>" are the GRE-interfaces. Normally everything is working great! But sometimes, after bird restart, a neighborstate remains in exchange or loading. It can be resolved with bird restart or waiting a long time (it seems everytime after some hours, the state is full again). I know, it is not easy to understand the setup and locate the error. Do I have a problem with the sequence numbers? Best regards Elmar

Ondrej Zajicek

10 Dec 10 Dec

10:25 a.m.

New subject: Neighbor state remains in exchange or loading

On Wed, Dec 09, 2009 at 10:49:43AM +0100, Vonlanthen, Elmar wrote:

...

Here I have the problem again. One neighbor state remains in state "loading". I activated some debug messages.

2009-12-09 10:23:54 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:23:54 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 289, Sum: 19615 ... 2009-12-09 10:24:44 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600

As you can see, the sequence numbers don't match. Is that the problem?

These messages are useful to locate the problem. The problem is not really a sequence number mismatch (this might happen and it is OK), but some subtle bugs in premature aging of LSAs (which happen if router receives an LSA created by its previous instance before restart). I found these bugs several weeks ago but i didn't know that they may cause such problems. I hope i will fix these bugs soon. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Vonlanthen, Elmar

3:10 p.m.

New subject: Neighbor state remains in exchange or loading

Hi Ondrej

...

...
2009-12-09 10:23:54 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600 2009-12-09 10:23:54 chgut1fw01 bird: Update Type: 1 ID: 172.16.101.1 RT: 172.16.101.1, Sn: 0x80000003 Age: 289, Sum: 19615 ... 2009-12-09 10:24:44 chgut1fw01 bird: Requesting 109th LSA: Type: 1, ID: 172.16.101.1, RT: 172.16.101.1, SN: 0x7fffffff, Age 3600

As you can see, the sequence numbers don't match. Is that the problem?

These messages are useful to locate the problem. The problem is not really a sequence number mismatch (this might happen and it is OK), but some subtle bugs in premature aging of LSAs (which happen if router receives an LSA created by its previous instance before restart). I found these bugs several weeks ago but i didn't know that they may cause such problems.

That seems to fit to my problem. It happens only after restart. Is there a workaround to prevent this behaviour?

...

I hope i will fix these bugs soon.

Is there a way I can help you? Unfortunately, my C-skills are not so good. But I could test with your modifications? Thanks for your help. Best regards Elmar

Ondrej Zajicek

12 Dec 12 Dec

11:47 a.m.

New subject: Neighbor state remains in exchange or loading

On Thu, Dec 10, 2009 at 04:10:39PM +0100, Vonlanthen, Elmar wrote:

...

That seems to fit to my problem. It happens only after restart. Is there a workaround to prevent this behaviour?

You can try the patch (in the attachment). It should fix the main source of problems.

...

...
I hope i will fix these bugs soon.

Is there a way I can help you? Unfortunately, my C-skills are not so good. But I could test with your modifications?

You can try the patch and enable log messages (packets and events) for OSPF. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Vonlanthen, Elmar

14 Dec 14 Dec

9:17 a.m.

New subject: Neighbor state remains in exchange or loading

Hello Ondrej

...

On Thu, Dec 10, 2009 at 04:10:39PM +0100, Vonlanthen, Elmar wrote:

...
That seems to fit to my problem. It happens only after restart. Is there a workaround to prevent this behaviour?

You can try the patch (in the attachment). It should fix the main source of problems.

...
...
I hope i will fix these bugs soon.

Is there a way I can help you? Unfortunately, my C-skills are not so good. But I could test with your modifications?

You can try the patch and enable log messages (packets and events) for OSPF.

Thank you very much for the patch. Good news, the problem with "loading" seems to be solved. But unfortunately the problem with neighbor state remaining in "exchange" seems to be another problem. In this case I don't see any LSA packets, only hello packets. Do you have an idea, what this problem might be? Best regards Elmar

Ondrej Zajicek

10:47 a.m.

New subject: Neighbor state remains in exchange or loading

On Mon, Dec 14, 2009 at 10:17:35AM +0100, Vonlanthen, Elmar wrote:

...

But unfortunately the problem with neighbor state remaining in "exchange" seems to be another problem. In this case I don't see any LSA packets, only hello packets.

Yes, that is probably completely different problem. But you should see at least some DBDES packets when neighbor state became "exchange". If you have enabled logging of events and packets, these should be in BIRD log (messages like "DBDES packet received from"). Could you send me that part of log? If one side is stuck in "exchange" state, what is the state of the other side? -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."

Vonlanthen, Elmar

16 Dec 16 Dec

6:15 a.m.

New subject: Neighbor state remains in exchange or loading

Hi Ondrej

...

...
But unfortunately the problem with neighbor state remaining in "exchange" seems to be another problem. In this case I don't see any LSA packets, only hello packets.

Yes, that is probably completely different problem. But you should see at least some DBDES packets when neighbor state became "exchange". If you have enabled logging of events and packets, these should be in BIRD log (messages like "DBDES packet received from"). Could you send me that part of log?

Hm, one time I had the problem with "exchange" again, but since then not anymore. Perhaps there was another problem in the environment.

...

If one side is stuck in "exchange" state, what is the state of the other side?

In this case, the other side has the state "full". With your latest patch everything seems to be fine now. I would let you know the problem apears again. Thanks for your good work. Best regards Elmar

6067

Age (days ago)

6093

Last active (days ago)

List overview

Download

25 comments

5 participants

participants (5)

Andy Davidson
Anton
Ondrej Zajicek
Vonlanthen, Elmar
Wolfgang Hennerbichler