Device protocol not recognizing interface address change
Hello, When restarting some routers at the institution I work, sometimes bird doesn't start announcing any routes until we either restart bird or restart the device protocol using birdc. I think this happens due to the init scripts responsible for setting the network up finishing before the interfaces are fully configured(without the ip address set up), which allows for the init scripts dependent of the network init script, like bird, to start running. The problem is that bird detects that the interfaces are there, but doesn't recognize that the machine is configured with ip addresses(because the interfaces aren't fully configured yet) in the networks that it is announcing, so it doesn't announce the routes. When this happens: "bird> show ospf (...) Area networks: 10.3.0.0/16 Advertise 10.0.0.0/16 Advertise (...) " ( The problem is that the Area networks aren't active). After restarting the bird process, the output becomes correct(because now bird is starting with the interfaces fully configured): "bird> show ospf Area networks: 10.3.0.0/16 Advertise Active 10.0.0.0/16 Advertise Active " We have the device protocol to scan every 10 seconds in our bird.conf: "protocol device { scan time 10; }" I think that even though when bird first starts it shouldn't immediately start announcing(because it didn't see that the interfaces ip addresses), it should after 10 seconds when it scans again, because then the interfaces are already fully configured. For example, when radvd starts it complains that the interfaces don't have ip addresses configured, so it checks every second for the interfaces until they are fully configured, and the starts running correctly. Our current working workaround is make the networking init script wait for some time before exiting, meaning that when the init dependents start executing, the interfaces are already fully configured. This might work for the machine startup, but it doesn't solve the problem when we add a new ip address in a new network to an already existing interface and make bird start announcing that network without restarting bird (or the interface). I haven't checked that the same behaviour described above happens in this situation, but it should. If you want me to try it I can. If I haven't explained some parts clearly just ask and I'll try to explain it better. Also, if you need any configuration details just ask. This problems happens in both bird "1.4.5-1~bpo70+1" and "1.5.0-4" distributed by Debian. So, my question is, shouldn't the Device protocol scan update see that the ip addresses of an interface have changed and if so, start announcing the route. Thank you, Bernardo Figueiredo
On Wed, Jul 22, 2015 at 05:50:09PM +0100, Bernardo Figueiredo wrote:
Hello,
When restarting some routers at the institution I work, sometimes bird doesn't start announcing any routes until we either restart bird or restart When this happens: ... "bird> show ospf (...) Area networks: 10.3.0.0/16 Advertise 10.0.0.0/16 Advertise (...)
I think that even though when bird first starts it shouldn't immediately start announcing(because it didn't see that the interfaces ip addresses), it should after 10 seconds when it scans again, because then the interfaces are already fully configured. ... So, my question is, shouldn't the Device protocol scan update see that the ip addresses of an interface have changed and if so, start announcing the route.
Hello BIRD Device protocol generally does not have problems with noticing new addresses. Most likely it could be problem with BIRD OSPF protocol not activating the interfaces / areas. Could you also report results of 'show interfaces' and 'show ospf interfaces' when it works and when it does not? Does help just restarting the OSPF protocol (birdc restart ospf1)? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Sorry for the delay. I include the output of the commands you asked as attachments. Doing birdc restart ospf1 worked. Às 09:26 PM de 07/27/2015, Ondrej Zajicek escreveu:
On Wed, Jul 22, 2015 at 05:50:09PM +0100, Bernardo Figueiredo wrote:
Hello,
When restarting some routers at the institution I work, sometimes bird doesn't start announcing any routes until we either restart bird or restart When this happens: ... "bird> show ospf (...) Area networks: 10.3.0.0/16 Advertise 10.0.0.0/16 Advertise (...) I think that even though when bird first starts it shouldn't immediately start announcing(because it didn't see that the interfaces ip addresses), it should after 10 seconds when it scans again, because then the interfaces are already fully configured. ... So, my question is, shouldn't the Device protocol scan update see that the ip addresses of an interface have changed and if so, start announcing the route. Hello
BIRD Device protocol generally does not have problems with noticing new addresses. Most likely it could be problem with BIRD OSPF protocol not activating the interfaces / areas.
Could you also report results of 'show interfaces' and 'show ospf interfaces' when it works and when it does not? Does help just restarting the OSPF protocol (birdc restart ospf1)?
Hello, I work with Bernardo. One piece of additional information: the problem seems to be caused by the interfaces being link down when BIRD comes up. They are already configured by the OS scripts, but they haven't finished negotiating the link yet (NO-CARRIER, state DOWN). After a few seconds the interfaces do come up, but it seems something inside BIRD (OSPF protocol, perhaps) is not noticing this. At the moment, as a workaround, we're doing a few sleeps before starting BIRD, until the interfaces are actually link up. But of course this isn't a solution. How can we help debug this further? Regards, -- Israel G. Lugo Núcleo de Redes e Comunicações Direção de Serviços de Informática Instituto Superior Técnico On 04-08-2015 13:03, Bernardo Figueiredo wrote:
Sorry for the delay.
I include the output of the commands you asked as attachments.
Doing birdc restart ospf1 worked.
Às 09:26 PM de 07/27/2015, Ondrej Zajicek escreveu:
On Wed, Jul 22, 2015 at 05:50:09PM +0100, Bernardo Figueiredo wrote:
Hello,
When restarting some routers at the institution I work, sometimes bird doesn't start announcing any routes until we either restart bird or restart When this happens: ... "bird> show ospf (...) Area networks: 10.3.0.0/16 Advertise 10.0.0.0/16 Advertise (...) I think that even though when bird first starts it shouldn't immediately start announcing(because it didn't see that the interfaces ip addresses), it should after 10 seconds when it scans again, because then the interfaces are already fully configured. ... So, my question is, shouldn't the Device protocol scan update see that the ip addresses of an interface have changed and if so, start announcing the route. Hello
BIRD Device protocol generally does not have problems with noticing new addresses. Most likely it could be problem with BIRD OSPF protocol not activating the interfaces / areas.
Could you also report results of 'show interfaces' and 'show ospf interfaces' when it works and when it does not? Does help just restarting the OSPF protocol (birdc restart ospf1)?
On Thu, Aug 13, 2015 at 06:29:27PM +0100, Israel G. Lugo wrote:
Hello,
I work with Bernardo.
One piece of additional information: the problem seems to be caused by the interfaces being link down when BIRD comes up. They are already configured by the OS scripts, but they haven't finished negotiating the link yet (NO-CARRIER, state DOWN).
Hi So these interfaces are UP but not LOWER_UP in 'ip addr list' output? I guess that you have enabled 'check link' in OSPF? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 13-08-2015 18:37, Ondrej Zajicek wrote:
So these interfaces are UP but not LOWER_UP in 'ip addr list' output?
This is the situation when BIRD is launched, right after the network service finishes: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 eth0.165@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT eth0.141@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT eth0.1411@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT eth0.2000@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT eth1 is uplink connection to area 0. eth0 is divided into access VLANs. At this time, all interfaces are UP but NO-CARRIER (driver prepping the NIC, link being negotiated, etc) => state DOWN. A few seconds later, we have the transition to normal state: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 eth0.165@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT eth0.141@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT eth0.1411@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT eth0.2000@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
I guess that you have enabled 'check link' in OSPF?
Yes. Stripped configuration follows: protocol kernel { learn; persist; scan time 20; export all; } protocol device { scan time 30; } protocol ospf backbone { import filter ... export filter ... tick 1; ecmp yes; area 0.0.0.0 { stub no; interface "eth1" { authentication cryptographic; password ... check link yes; }; interface "lo" { stub; }; }; area 0.0.165.0 { stub yes; summary yes; interface "eth0.2000" { type ptp; check link yes; }; interface "eth0.141", "eth0.165", "eth0.1411" { stub; check link yes; }; networks { ... }; }; } I just tried to reproduce the problem, by rebooting the router with the switch ports administratively down. But now the behavior was different: bird simply died. bird6 is there and running, but the bird process is not. The only log messages are: 2015-08-13T20:11:09.054662+01:00 gwtsul bird: Started 2015-08-13T20:11:13.001849+01:00 gwtsul bird: backbone: Iface eth1 in down state? Is this behavior by design? This is a problem e.g. in case of power failure. Sometimes the switch takes a minute longer than the PC to come back up. I would expect OSPF to wait until then. Regards, -- Israel G. Lugo Núcleo de Redes e Comunicações Direção de Serviços de Informática Instituto Superior Técnico
On Thu, Aug 13, 2015 at 08:23:44PM +0100, Israel G. Lugo wrote:
On 13-08-2015 18:37, Ondrej Zajicek wrote:
So these interfaces are UP but not LOWER_UP in 'ip addr list' output?
This is the situation when BIRD is launched, right after the network service finishes:
...
I just tried to reproduce the problem, by rebooting the router with the switch ports administratively down. But now the behavior was different: bird simply died. bird6 is there and running, but the bird process is not.
The only log messages are:
2015-08-13T20:11:09.054662+01:00 gwtsul bird: Started 2015-08-13T20:11:13.001849+01:00 gwtsul bird: backbone: Iface eth1 in down state?
Is this behavior by design? This is a problem e.g. in case of power failure. Sometimes the switch takes a minute longer than the PC to come back up. I would expect OSPF to wait until then.
Hi No, it is a bug, not behavior by design. Could you send me a log from the start of BIRD (ideally two logs for both cases - one where interfaces end in down state and one when BIRD crashes with 'in down state?' message) with enabled debug messages by adding this option to the OSPF protocol: debug {events, states, interfaces}; ? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Tue, Aug 18, 2015 at 10:54:40AM +0200, Ondrej Zajicek wrote:
No, it is a bug, not behavior by design. Could you send me a log from the start of BIRD (ideally two logs for both cases - one where interfaces end in down state and one when BIRD crashes with 'in down state?' message) with enabled debug messages by adding this option to the OSPF protocol:
debug {events, states, interfaces};
?
Or you could just try and test attached patch, it should fix the problem. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi, Sorry for the delay in responding. I was away on vacations. I will try the patch. Would you still like me to gather debug output first, with the current version? Regards, -- Israel G. Lugo Núcleo de Redes e Comunicações Direção de Serviços de Informática Instituto Superior Técnico On 19-08-2015 10:20, Ondrej Zajicek wrote:
On Tue, Aug 18, 2015 at 10:54:40AM +0200, Ondrej Zajicek wrote:
No, it is a bug, not behavior by design. Could you send me a log from the start of BIRD (ideally two logs for both cases - one where interfaces end in down state and one when BIRD crashes with 'in down state?' message) with enabled debug messages by adding this option to the OSPF protocol:
debug {events, states, interfaces};
?
Or you could just try and test attached patch, it should fix the problem.
On Mon, Aug 24, 2015 at 01:30:14PM +0100, Israel G. Lugo wrote:
Hi,
Sorry for the delay in responding. I was away on vacations.
I will try the patch. Would you still like me to gather debug output first, with the current version?
Hi If the patch works then it is unnecessary. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hello, On 19-08-2015 10:20, Ondrej Zajicek wrote:
On Tue, Aug 18, 2015 at 10:54:40AM +0200, Ondrej Zajicek wrote:
No, it is a bug, not behavior by design. Could you send me a log from the start of BIRD (ideally two logs for both cases - one where interfaces end in down state and one when BIRD crashes with 'in down state?' message) with enabled debug messages by adding this option to the OSPF protocol:
debug {events, states, interfaces};
?
Or you could just try and test attached patch, it should fix the problem.
I apologize for the delay in responding. I have tried the patch and it does fix the problem. Namely: Before the patch, starting Bird 1.5.0 with all interfaces LINK DOWN would leave the OSPF interfaces in Down state, from which they would never recover. Sometimes (perhaps a race condition?) Bird would actually quit with the log message "Iface eth1 in down state?" from the OSPF protocol (eth1 area 0.0.0.0). After the patch, this no longer happens. Starting Bird with all interfaces LINK DOWN leaves the OSPF interfaces in Loopback state, and they recover to Waiting when link is brought up. Thank you for the help and sorry again for the delay. Not always easy to find time to set up a testing lab with the day to day work. Regards, -- Israel G. Lugo Núcleo de Redes e Comunicações Direção de Serviços de Informática Instituto Superior Técnico
On Mon, Sep 28, 2015 at 12:31:09PM +0100, Israel G. Lugo wrote:
I apologize for the delay in responding. I have tried the patch and it does fix the problem. Namely:
Before the patch, starting Bird 1.5.0 with all interfaces LINK DOWN would leave the OSPF interfaces in Down state, from which they would never recover. Sometimes (perhaps a race condition?) Bird would actually quit with the log message "Iface eth1 in down state?" from the OSPF protocol (eth1 area 0.0.0.0).
After the patch, this no longer happens. Starting Bird with all interfaces LINK DOWN leaves the OSPF interfaces in Loopback state, and they recover to Waiting when link is brought up.
Thank you for the help and sorry again for the delay. Not always easy to find time to set up a testing lab with the day to day work.
Hi, thanks for the response. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (3)
-
Bernardo Figueiredo -
Israel G. Lugo -
Ondrej Zajicek