Bird 1.5.0: protocol 'direct' looses routes
Hi all. I'm testing bird 1.5.0 on PPPoE BRAS, and today noticed that sometimes it looses routes. For ex., today it looses near 5 networks of 103 total (before restart - it has 98 routes, after - 103) Possibly this may be related with tunnel session hangup - when user reconnects with same IP and then old session is dropped, bird may loose this route while route is present in system. But I'm not sure about this.
13.06.2015 10:43, Andrew пишет:
Hi all.
I'm testing bird 1.5.0 on PPPoE BRAS, and today noticed that sometimes it looses routes. For ex., today it looses near 5 networks of 103 total (before restart - it has 98 routes, after - 103)
Possibly this may be related with tunnel session hangup - when user reconnects with same IP and then old session is dropped, bird may loose this route while route is present in system. But I'm not sure about this.
Trouble is confirmed on test machine. Testing result: Initial routes: bird> sh route protocol direct1 192.168.250.240/28 dev eth0 [direct1 13:29:38] * (240) 192.168.255.64/27 dev vlan1 [direct1 13:29:38] * (240) 10.255.0.21/32 dev eth0 [direct1 13:29:38] * (240) #ip a a 10.255.0.21/32 dev ifb0 bird> sh route protocol direct1 192.168.250.240/28 dev eth0 [direct1 13:29:38] * (240) 192.168.255.64/27 dev vlan1 [direct1 13:29:38] * (240) 10.255.0.21/32 dev ifb0 [direct1 13:30:19] * (240) #ip a d 10.255.0.21/32 dev ifb0 bird> sh route protocol direct1 192.168.250.240/28 dev eth0 [direct1 13:29:38] * (240) 192.168.255.64/27 dev vlan1 [direct1 13:29:38] * (240) bird> restart direct1 direct1: restarted bird> sh route protocol direct1 192.168.250.240/28 dev eth0 [direct1 13:34:09] * (240) 192.168.255.64/27 dev vlan1 [direct1 13:34:09] * (240) 10.255.0.21/32 dev eth0 [direct1 13:34:09] * (240) bird> Is there a way to fix it/add temporal workaround in simple way? Or where in code placed 'direct' protocol routing handling?
On Tue, Jun 16, 2015 at 01:39:24PM +0300, Andrew wrote:
13.06.2015 10:43, Andrew пишет:
Hi all.
I'm testing bird 1.5.0 on PPPoE BRAS, and today noticed that sometimes it looses routes. For ex., today it looses near 5 networks of 103 total (before restart - it has 98 routes, after - 103)
Possibly this may be related with tunnel session hangup - when user reconnects with same IP and then old session is dropped, bird may loose this route while route is present in system. But I'm not sure about this.
Trouble is confirmed on test machine.
Testing result:
Hi Thanks for the example, it is very descriptive. The problem here is that 'direct' protocol is a stateless translation of address announcements to route announcements, but routing tables do not consider interface as part of route key, therefore only one route for a prefix is remembered and later removed. We could fix that in a similar way how BGP add path is implemented - using separate route source per iface id. In that case, if you have the same network on more interfaces, direct protocol will generate multiple routes which will compete for being the best route. I will write a patch for this. BTW, the protocol 'loses' not 'looses' routes. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
16.06.2015 14:42, Ondrej Zajicek пишет:
On Tue, Jun 16, 2015 at 01:39:24PM +0300, Andrew wrote:
13.06.2015 10:43, Andrew пишет:
Hi all.
I'm testing bird 1.5.0 on PPPoE BRAS, and today noticed that sometimes it looses routes. For ex., today it looses near 5 networks of 103 total (before restart - it has 98 routes, after - 103)
Possibly this may be related with tunnel session hangup - when user reconnects with same IP and then old session is dropped, bird may loose this route while route is present in system. But I'm not sure about this. Trouble is confirmed on test machine.
Testing result: Hi
Thanks for the example, it is very descriptive. The problem here is that 'direct' protocol is a stateless translation of address announcements to route announcements, but routing tables do not consider interface as part of route key, therefore only one route for a prefix is remembered and later removed. We could fix that in a similar way how BGP add path is implemented - using separate route source per iface id. In that case, if you have the same network on more interfaces, direct protocol will generate multiple routes which will compete for being the best route. I will write a patch for this. Yes, it'll be great. Or as quick workaround you may add possibility to fetch link-scope routes via kernel protocol (or add kernel routing table scanning to direct proto).
BTW, the protocol 'loses' not 'looses' routes.
Sorry, you're right. English isn't my native language :(
On Tue, Jun 16, 2015 at 03:20:25PM +0300, Andrew wrote:
Thanks for the example, it is very descriptive. The problem here is that 'direct' protocol is a stateless translation of address announcements to route announcements, but routing tables do not consider interface as part of route key, therefore only one route for a prefix is remembered and I will write a patch for this.
Yes, it'll be great.
You could try attached patch -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
16.06.2015 22:40, Ondrej Zajicek пишет:
On Tue, Jun 16, 2015 at 03:20:25PM +0300, Andrew wrote:
Thanks for the example, it is very descriptive. The problem here is that 'direct' protocol is a stateless translation of address announcements to route announcements, but routing tables do not consider interface as part of route key, therefore only one route for a prefix is remembered and I will write a patch for this. Yes, it'll be great. You could try attached patch
Thanks, it works. Also, with 1.5.0 I have long OSPF initialization (near 2 minutes - 1.4.5 and quagga starts much faster). I have OSPF debug log; should I attach it to mail (does mailing list supports attaches), or should I send it to your personal mail?
On Tue, Jun 16, 2015 at 11:11:22PM +0300, Andrew wrote:
Also, with 1.5.0 I have long OSPF initialization (near 2 minutes - 1.4.5 and quagga starts much faster). I have OSPF debug log; should I attach it to mail (does mailing list supports attaches), or should I send it to your personal mail?
The mailing list accepts attachments, you could send logs there. Also it would be useful to attach your OSPF configuration. Or you could send it directly to my personal mail, if you want. In 1.5.0 there are changes in OSPF initialization, it is much faster in my test setup but it is possible that there might be some issues in some border cases. Unfortunately, OSPF initialization is just loosely specified by RFC 2328 and there are plenty of unspecified details. I wonder if others have some negative or positive experience with OSPF initialization (neighbor establishment) in 1.5.0 compared to previous versions. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
17.06.2015 13:09, Ondrej Zajicek пишет:
On Tue, Jun 16, 2015 at 11:11:22PM +0300, Andrew wrote:
Also, with 1.5.0 I have long OSPF initialization (near 2 minutes - 1.4.5 and quagga starts much faster). I have OSPF debug log; should I attach it to mail (does mailing list supports attaches), or should I send it to your personal mail? The mailing list accepts attachments, you could send logs there. Also it would be useful to attach your OSPF configuration. Or you could send it directly to my personal mail, if you want.
In 1.5.0 there are changes in OSPF initialization, it is much faster in my test setup but it is possible that there might be some issues in some border cases. Unfortunately, OSPF initialization is just loosely specified by RFC 2328 and there are plenty of unspecified details.
I wonder if others have some negative or positive experience with OSPF initialization (neighbor establishment) in 1.5.0 compared to previous versions.
Ok, log is in attach. Config is trivial: protocol ospf ospf_ua { import all; area 0.0.0.0 { networks { 192.168.255.64/27; }; interface "vlan1" { cost 5; type broadcast; hello 10; retransmit 5; wait 40; dead count 4; authentication none; stub no; }; interface "*" { stub yes; }; }; } From other side, other server with bird-1.5.0 today fetches routes in less than minute. Config is almost same (except it has more areas/ifaces, and has enabled ecmp): protocol ospf ospf_mix { import all; export filter export_OSPF; ecmp yes; area 0 { networks { 192.168.255.64/27; }; interface "vlan1" { cost 5; type broadcast; hello 10; retransmit 5; wait 40; dead count 4; authentication none; stub no; }; }; area 1 { networks { 10.255.193.0/24; }; interface "vlan2" { cost 5; type broadcast; hello 10; retransmit 5; wait 40; dead count 4; authentication none; stub no; }; }; area 100 { networks { 10.255.192.0/24; }; interface "vlan3" { cost 5; type broadcast; hello 10; retransmit 5; wait 40; dead count 4; authentication none; stub no; }; }; }
On Wed, Jun 17, 2015 at 01:25:28PM +0300, Andrew wrote:
17.06.2015 13:09, Ondrej Zajicek пишет:
On Tue, Jun 16, 2015 at 11:11:22PM +0300, Andrew wrote:
Also, with 1.5.0 I have long OSPF initialization (near 2 minutes - 1.4.5 and quagga starts much faster). I have OSPF debug log; should I attach it to mail (does mailing list supports attaches), or should I send it to your personal mail? The mailing list accepts attachments, you could send logs there. Also it would be useful to attach your OSPF configuration. Or you could send it directly to my personal mail, if you want.
In 1.5.0 there are changes in OSPF initialization, it is much faster in my test setup but it is possible that there might be some issues in some border cases. Unfortunately, OSPF initialization is just loosely specified by RFC 2328 and there are plenty of unspecified details.
I wonder if others have some negative or positive experience with OSPF initialization (neighbor establishment) in 1.5.0 compared to previous versions.
Ok, log is in attach.
Relevant neihgbors are 10.255.192.101 and 10.255.192.102, while the local router ID is 10.255.0.21? What OSPF implementation is used by these neighbors? How long it took for BIRD 1.4.5 and Quagga? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
18.06.2015 01:49, Ondrej Zajicek пишет:
On Wed, Jun 17, 2015 at 01:25:28PM +0300, Andrew wrote:
17.06.2015 13:09, Ondrej Zajicek пишет:
On Tue, Jun 16, 2015 at 11:11:22PM +0300, Andrew wrote:
Also, with 1.5.0 I have long OSPF initialization (near 2 minutes - 1.4.5 and quagga starts much faster). I have OSPF debug log; should I attach it to mail (does mailing list supports attaches), or should I send it to your personal mail? The mailing list accepts attachments, you could send logs there. Also it would be useful to attach your OSPF configuration. Or you could send it directly to my personal mail, if you want.
In 1.5.0 there are changes in OSPF initialization, it is much faster in my test setup but it is possible that there might be some issues in some border cases. Unfortunately, OSPF initialization is just loosely specified by RFC 2328 and there are plenty of unspecified details.
I wonder if others have some negative or positive experience with OSPF initialization (neighbor establishment) in 1.5.0 compared to previous versions. Ok, log is in attach. Relevant neihgbors are 10.255.192.101 and 10.255.192.102, while the local router ID is 10.255.0.21? Local router ID was taken automatically (10.255.0.21/32 - dynamic address). 10.255.192.x - are addresses from other area. AFAIK router ID may be any, not just IP from connected network. What OSPF implementation is used by these neighbors? How long it took for BIRD 1.4.5 and Quagga? These neighbors work with quagga 0.99.22. BIRD 1.4.5 starts much faster (currently I haven't possibility to check it - test box isn't near me and it's down); quagga on neighbor xxx.xxx.202.4 (IP 192.168.255.91) starts in some seconds (it's old 0.99.9)
On Thu, Jun 18, 2015 at 11:25:13AM +0300, Andrew wrote:
What OSPF implementation is used by these neighbors? How long it took for BIRD 1.4.5 and Quagga? These neighbors work with quagga 0.99.22. BIRD 1.4.5 starts much faster (currently I haven't possibility to check it - test box isn't near me and it's down); quagga on neighbor xxx.xxx.202.4 (IP 192.168.255.91) starts in some seconds (it's old 0.99.9)
Well, it is likely a minor compatibility problem between Quagga and BIRD. During the loading phase one side sends LSREQ packets and the other side answers with LSUPD packets. As LSREQ has larger cappacity, one LSREQ packet should be answered by more (~3) LSUPD packets. BIRD waits for all these LSUPD packets before sending another LSREQ, but in this case Quagga is sending just one or two LSUPD, therefore BIRD does not answer with next LSREQ immediately but after a timeout. Old behavior was bit more aggresive, BIRD sends next LSREQ immediately after the first LSUPD, which may lead to a request storm where one LSA was requested and transmitted multiple times (because remaining LSUPD packets are still pending and would also lead to LSREQ). I would discuss this issue with Quagga devs. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Andrew -
Ondrej Zajicek