On Fri, Jun 18, 2021 at 05:06:27PM +0100, Matthew Reeve wrote:
Hi, yes sure, here it is. Please let me know if this does not give you what you need.
Thanks!
Thanks, that looks like an issue with slists. We had similar issue with lists code in the past and reworked them to be more conservative. Will check that.
root@OpenWrt:/tmp# gdb debug/bird bird.1623776146.6869.7.core GNU gdb (GDB) 10.1 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "arm-openwrt-linux". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>.
For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from debug/bird... [New LWP 6869] Core was generated by `./bird'. Program terminated with signal SIGBUS, Bus error. #0 ospf_rt_reset (p=0x1d610a0) at proto/ospf/rt.c:1646 1646 proto/ospf/rt.c: No such file or directory. (gdb) bt #0 ospf_rt_reset (p=0x1d610a0) at proto/ospf/rt.c:1646 #1 ospf_rt_spf (p=0x1d610a0) at proto/ospf/rt.c:1698 #2 ospf_rt_spf (p=0x1d610a0) at proto/ospf/rt.c:1688 #3 ospf_disp (timer=<optimized out>) at proto/ospf/ospf.c:468 #4 0x00061574 in timers_fire (loop=0xc4878 <main_timeloop>) at lib/timer.c:235 #5 0x00012ca8 in io_loop () at sysdep/unix/io.c:2195 #6 main (argc=<optimized out>, argv=<optimized out>) at sysdep/unix/main.c:939 (gdb)
On 18/06/2021 16:16, Ondrej Zajicek wrote:
On Mon, Jun 14, 2021 at 04:25:04PM +0100, Matthew Reeve wrote:
Hi,
when using bird 2.0.8 on openwrt 21.02 (and other versions) on a Netgear R7800 router, if the OSPF protocol is used, either v2 or v3, bird immediately crashes on startup with:
Fri Jun 11 14:41:11 2021 daemon.info bird: Started Fri Jun 11 14:41:11 2021 kern.err kernel: [ 3500.853248] Alignment trap: not handling instruction f44c0a1f at [<00035848>] Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.853283] 8<--- cut here --- Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.859363] Unhandled fault: alignment exception (0x801) at 0x007e0624 Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.862443] pgd = 0bbef4fd Fri Jun 11 14:41:11 2021 kern.alert kernel: [ 3500.868821] [007e0624] *pgd=5d6ca835, *pte=5c40b75f, *ppte=5c40bc7f
This router uses an ARMv7 processor and the issue seems to be to do with memory alignment issues. I've debugged it and traced it to an access to the top_hash_entry struct. I've found that if I add the PACKED macro to the struct definition then it fixes the problem, as per this patch: Hi
Thanks, could you try to get backtrace from the coredump using gdb to see where is the invalid access?
-- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."