On Thu, Jan 28, 2010 at 04:40:54PM +0300, Mikhail A. Grishin wrote:
Hi,
We found that "Finite state machine error" problem is not related to your patches. It randomly occurs on our production server at the time of daemon startup :((
The problem is occurs on small number of peers. (2 or 3 or 4 from ~280) Some problem peers are the same at next startup, some - not.
On test server with small number of active peers (and same config) we doesn't see this issue.
What can be done? Right now we see the problem on pure 1.2.0 release...
This might be a buggy version of firmware in the neighbor, as well as some strange bug in BIRD.
About "UPDATE message immediately after it sent OPEN" - we ask one of our customers (which hit that problem) to collect debug from his side. See the attachments (3 files).
I can't find the KEEPALIVE message in the log, but i don't know Cisco enough to be sure (perhaps it just does not log it). The best thing would be to run on route server: tcpdump -i eth0 -s 0 -v -n ip host 192.168.1.1 > logfile (with appropriate network device and IP address of one of problematic neighbors) and send me that logfile. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."