bird generating 600Mbit traffic
Hello, bird (1.3.10, earlier versions too) sometimes starts flooding my network with ospf packets, generating over 600Mbit of traffic. tcpdump shows traffic between two bird machines, problematic one sends LS- Request then receives LS-Update, asks Request again, receives LS-Update and so on. http://ixion.pld-linux.org/~arekm/p2/log-bird.txt Any ideas on what could be the reason for such behaviour? Thanks, -- Arkadiusz Miśkiewicz, arekm / maven.pl
On Mon, Jul 01, 2013 at 10:44:34AM +0200, Arkadiusz Miśkiewicz wrote:
Hello,
bird (1.3.10, earlier versions too) sometimes starts flooding my network with ospf packets, generating over 600Mbit of traffic.
tcpdump shows traffic between two bird machines, problematic one sends LS- Request then receives LS-Update, asks Request again, receives LS-Update and so on.
http://ixion.pld-linux.org/~arekm/p2/log-bird.txt
Any ideas on what could be the reason for such behaviour?
Hello I would check that but it would be useful to know several things: Is the problem self-limiting or you have to restart the OSPF protocol to stop it? Are there any related messages in BIRD log? Could you check during some of such floods what is the status of some related LSAs in the LSAdb (show ospf lsadb) of related machines - whether it is here, with older, the same or newer seqnum than one sent to network? And also what is the status of DR/BDR on such network during the event (show ospf interface) - whether related machines have the same idea about DR and BDR and whether the problematic nodes are always in the same position (like sender is 'other' and receiver is 'BDR') during such problems. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Monday 01 of July 2013, Ondrej Zajicek wrote:
On Mon, Jul 01, 2013 at 10:44:34AM +0200, Arkadiusz Miśkiewicz wrote:
Hello,
bird (1.3.10, earlier versions too) sometimes starts flooding my network with ospf packets, generating over 600Mbit of traffic.
tcpdump shows traffic between two bird machines, problematic one sends LS- Request then receives LS-Update, asks Request again, receives LS-Update and so on.
http://ixion.pld-linux.org/~arekm/p2/log-bird.txt
Any ideas on what could be the reason for such behaviour?
Hello
I would check that but it would be useful to know several things:
Is the problem self-limiting or you have to restart the OSPF protocol to stop it?
I was restarting whole bird to get rid of the problem. Once it was flodding for over 6h, so didn't fix by itself. Now there was no problem for last 6 days until about half hour ago.
Are there any related messages in BIRD log?
with log syslog { error, fatal }; - zero logs
Could you check during some of such floods what is the status of some related LSAs in the LSAdb (show ospf lsadb) of related machines - whether it is here, with older, the same or newer seqnum than one sent to network?
show ospf lsadb http://ixion.pld-linux.org/~arekm/p2/log-bird-lsadb.txt in tcpdump various seq numbers: Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000008, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000008, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000005, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000004, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16
And also what is the status of DR/BDR on such network during the event (show ospf interface) - whether related machines have the same idea about DR and BDR and whether the problematic nodes are always in the same position (like sender is 'other' and receiver is 'BDR') during such problems.
on node generating traffic bird> show ospf interface MyOSPF: Interface eth0 (1.1.14.0/22) Type: broadcast Area: 0.0.0.0 (0) State: backup Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 1.1.16.5 Designed router (IP): 1.1.16.5 Backup designed router (ID): 1.1.17.102 Backup designed router (IP): 1.1.17.102 bird> on other node that was responding (and generating a bit lower trafic in reply) bird> show ospf interface MyOSPF: Interface eth0 (1.1.14.80/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 Interface eth0 (1.1.14.81/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 Interface eth0 (1.1.14.214/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 Interface eth0 (1.1.14.0/22) Type: broadcast Area: 0.0.0.0 (0) State: drother Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 1.1.16.5 Designed router (IP): 1.1.16.5 Backup designed router (ID): 1.1.17.102 Backup designed router (IP): 1.1.17.102 Interface eth0 (1.1.14.83/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 Interface eth0 (1.1.14.84/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 Interface eth0 (1.1.14.85/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 Interface eth0 (1.1.14.86/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 Interface eth0 (1.1.14.87/32) Type: broadcast Area: 0.0.0.0 (0) State: waiting (stub) Priority: 1 Cost: 10 Hello timer: 10 Wait timer: 40 Dead timer: 40 Retransmit timer: 5 Designed router (ID): 0.0.0.0 Designed router (IP): 0.0.0.0 Backup designed router (ID): 0.0.0.0 Backup designed router (IP): 0.0.0.0 bird> -- Arkadiusz Miśkiewicz, arekm / maven.pl
On Sat, Jul 06, 2013 at 08:00:32PM +0200, Arkadiusz Miśkiewicz wrote:
Are there any related messages in BIRD log?
with log syslog { error, fatal }; - zero logs
Well, most of these messages are warnings, so perhaps it would be a good idea to add them.
Could you check during some of such floods what is the status of some related LSAs in the LSAdb (show ospf lsadb) of related machines - whether it is here, with older, the same or newer seqnum than one sent to network?
show ospf lsadb http://ixion.pld-linux.org/~arekm/p2/log-bird-lsadb.txt
Ths is from the receiving router?
in tcpdump various seq numbers: Advertising Router 1.1.15.159, seq 0x80000009, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000008, age 3600s, length 16 Advertising Router 1.1.15.159, seq 0x80000008, age 3600s, length 16
Do you have a tcpdump log? The important for me are seqnums related to LSA-IDs of LSAs (so they can be compared to specific LSAs in LSAsb): Advertising Router 1.1.15.159, seq 0x80000003, age 3600s, length 16 External LSA (5), LSA-ID: 31.168.39.0 ^^^^^^^^^^^ -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Arkadiusz Miśkiewicz -
Ondrej Zajicek