[Euro-ix-rs-vwg] New release 1.2.0

Mikhail A. Grishin magr at ripn.net
Thu Jan 28 14:40:54 CET 2010


Ondrej Zajicek пишет:
> On Thu, Jan 28, 2010 at 12:54:03PM +0300, Mikhail A. Grishin wrote:
>> First of all, thank you for patch and for fast respond!
>>
>> After applying both patches (date patch and well-known communities) on  
>> production server, we got some strange errors:
>>
>> Jan 28 12:02:04 msk-rsm2 bird: R34485x1: Error: Finite state machine error
>> Jan 28 12:02:17 msk-rsm2 bird: R13174x1: Error: Finite state machine error
>> Jan 28 12:02:28 msk-rsm2 bird: R3218x1: Error: Finite state machine error
>> Jan 28 12:02:30 msk-rsm2 bird: R41842x1: Error: Finite state machine error
>> Jan 28 12:03:09 msk-rsm2 bird: R34485x1: Error: Finite state machine error
>> Jan 28 12:03:26 msk-rsm2 bird: R41842x1: Error: Finite state machine error
>> Jan 28 12:03:29 msk-rsm2 bird: R13174x1: Error: Finite state machine error
>> Jan 28 12:03:37 msk-rsm2 bird: R3218x1: Error: Finite state machine error
>>
>> What does it mean?
> 
> This error messages mean that BIRD received BGP messages (packets) that
> were unexpected with regard to the current state of the BGP session.
> According to the debug log you sent, it seems that the neighbor sent
> UPDATE message immediately after it sent OPEN message, but it should
> send KEEPALIVE message first.
> 
> I have no idea how such problem might be caused by these patches.
> Is this problem related to just a small number of neighbors
> and other neighbors work well?
> 

Hi,

We found that "Finite state machine error" problem is not related to 
your patches. It randomly occurs on our production server at the time of 
daemon startup :((

The problem is occurs on small number of peers.
(2 or 3 or 4 from ~280)
Some problem peers are the same at next startup, some - not.

On test server with small number of active peers (and same config) we 
doesn't see this issue.

What can be done? Right now we see the problem on pure 1.2.0 release...

About "UPDATE message immediately after it sent OPEN" - we ask one of 
our customers (which hit that problem) to collect debug from his side.
See the attachments (3 files).


One more, after applying the "well-known communities" patch, the BIRD 
goes to .core two times ( 5-30 seconds after startup):(
Do you need the last core file?
Without "well-known communities" patch we doesn't see this. (yet)

-- 
Mikhail A. Grishin        E-mail: magr at ripn.net
Phone: +7 (495) 737-0685  MSK-IX & Russian Institute for Public Networks
Phone: +7 (499) 192-9179  Network Operations Center
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: device.txt
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20100128/21e4cae2/attachment-0003.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_short.txt
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20100128/21e4cae2/attachment-0004.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log_full.txt
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20100128/21e4cae2/attachment-0005.txt>


More information about the Bird-users mailing list