On 11.04.2014 13:16, Ondrej Zajicek wrote:
On Thu, Apr 10, 2014 at 09:44:57PM +0400, Alexander V. Chernikov wrote:
On 10.04.2014 12:41, Ondrej Zajicek wrote:
On Thu, Apr 10, 2014 at 10:14:03AM +0400, Peter Andreev wrote:
Hi Alexander,
I tried "debug MLPA1 all" from birdc console, but nothing new appeared in log file.
Currently I rolled back to 1.3.10 version from ports because 1.4.2 started to crash with the following backtrace: Do you have the same socket problems in 1.3.10 or not?
We're currently investigating this further, it really looks like OS-dependent problem. However, it seems that core was triggered by uninitialized rta *a in IPv6 bgp_do_rx_update(). Hmm, that seems like sime kind of strange coincidence, as the rta *a variable is not used before it is initialized (see attached patch). Well, not exactly. DECODE_PREFIX() can perform 'goto done' for invalid prefix being withdrawn. In that case we'll probably get garbage in a.
Anyway, initially I was talking about IPv6 case (which has different bgp_do_rx_update() version). The same there, DO_NLRI() can jump to 'done' label before *a initialization which actually happened. I suddenly realized that we're speaking about different crash cases: the one I'm talking about is much clearer: #0 0x000000000042fa37 in rta_free (r=0x40118adb8) at route.h:476 #1 0x000000000042f4c2 in bgp_do_rx_update (conn=0x801007dc8, withdrawn=0x801f3b015 "", withdrawn_len=0, nlri=0x801f3b055 "^�", nlri_len=0, attrs=0x801f3b017 "\220\017", attr_len=62) at ../../../proto/bgp/packets.c:1255 #2 0x000000000042fc04 in bgp_rx_update (conn=0x801007dc8, pkt=0x801f3b000 '�' <repeats 16 times>, len=85) at ../../../proto/bgp/packets.c:1304 #3 0x000000000043011e in bgp_rx_packet (conn=0x801007dc8, pkt=0x801f3b000 '�' <repeats 16 times>, len=85) at ../../../proto/bgp/packets.c:1525 #4 0x0000000000430283 in bgp_rx (sk=0x80101b140, size=85) at ../../../proto/bgp/packets.c:1570 #5 0x000000000045eb5a in sk_read (s=0x80101b140) at io.c:1602 #6 0x000000000045f5d4 in io_loop () at io.c:1843 #7 0x00000000004669b3 in main (argc=3, argv=0x7fffffffdb20) at main.c:820 (gdb) x/96xb pkt 0x801f3b000: 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x801f3b008: 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x801f3b010: 0x00 0x55 0x02 0x00 0x00 0x00 0x3e 0x90 0x801f3b018: 0x0f 0x00 0x3a 0x00 0x02 0x01 0x20 0x20 0x801f3b020: 0x01 0x1b 0xf8 0x30 0x2a 0x02 0x16 0xd8 0x801f3b028: 0x01 0x01 0x20 0x2a 0x00 0x17 0x80 0x30 0x801f3b030: 0x20 0x01 0x06 0x7c 0x23 0x2c 0x30 0x2a 0x801f3b038: 0x02 0x16 0xd8 0x01 0x03 0x20 0x2a 0x02 0x801f3b040: 0x16 0xd8 0x30 0x2a 0x02 0x16 0xd8 0x01 0x801f3b048: 0x02 0x20 0x28 0x00 0x06 0x70 0x30 0x2a 0x801f3b050: 0x02 0x16 0xd8 0x01 0x04 0x5e 0xab 0x00 0x801f3b058: 0x05 0xda 0xb7 0xc1 0x00 0x30 0x20 0x01 (gdb) p p->mp_reach_len $7 = 0 (gdb) p p->mp_unreach_len $8 = 58 DO_NLRI(mp_reach) is not used so "a" assignment does not happen. Btw, clang yells for both cases.
Perhaps it is related to the bug fixed in efd6d12b975441c7e1875a59dd9e0f3db7e958cb ? Are these compiler options used in FreeBSD build?