BIRD memory usage
Pavel Tvrdík
pavel.tvrdik at nic.cz
Wed Sep 7 10:08:06 CEST 2016
Hi, Just.
On 2016-09-06 22:50, Justin Cattle wrote:
> I found some time to package using a patch to the latest 1.6.0
> release, created from a diff of origin/krt-export-filtr-fix against
> v1.6.0-34-g768d013 [ seems to be the top three commits ].
Yes, the top three commits, exactly!
> I hope that's valid. That patch applied without issue, and I wrapped
> it into a debian patch.
>
> I've installed on a few hosts, and I'll report back tomorrow if I get
> a chance.
Great!
>
> Thanks again for the speedy code :)
>
> Here's my debian package patch for reference:
>
> cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch
> filter/tree: prefer xmalloc/xfree to malloc/free
> rt-table: fix kernel protocol export filter memory bug
> Index: bird-1.6.0/filter/tree.c
> ===================================================================
> --- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000
> +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100
> @@ -82,7 +82,7 @@
> if (len <= 1024)
> buf = alloca(len * sizeof(struct f_tree *));
> else
> - buf = malloc(len * sizeof(struct f_tree *));
> + buf = xmalloc(len * sizeof(struct f_tree *));
>
> /* Convert a degenerated tree into an sorted array */
> i = 0;
> @@ -94,7 +94,7 @@
> root = build_tree_rec(buf, 0, len);
>
> if (len > 1024)
> - free(buf);
> + xfree(buf);
>
> return root;
> }
> Index: bird-1.6.0/nest/rt-table.c
> ===================================================================
> --- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000
> +0100
> +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100
> @@ -60,6 +60,21 @@
> static inline void rt_schedule_prune(rtable *tab);
>
> +static int rte_update_nest_cnt; /* Nesting counter to allow recursive
> updates */
> +
> +static inline void
> +rte_update_lock(void)
> +{
> + rte_update_nest_cnt++;
> +}
> +
> +static inline void
> +rte_update_unlock(void)
> +{
> + if (!--rte_update_nest_cnt)
> + lp_flush(rte_update_pool);
> +}
> +
> static inline struct ea_list *
> make_tmp_attrs(struct rte *rt, struct linpool *pool)
> {
> @@ -609,10 +624,18 @@
> if (!rte_is_valid(best0))
> return NULL;
>
> + /* This non-static function could be called from outside rt-table.c
> file and
> + * we need to ensure that a temporary allocated linpool memory
> @rte_update_pool
> + * will be freed */
> + rte_update_lock();
> +
> best = export_filter(ah, best0, rt_free, tmpa, silent);
>
> if (!best || !rte_is_reachable(best))
> + {
> + rte_update_unlock();
> return best;
> + }
>
> for (rt0 = best0->next; rt0; rt0 = rt0->next)
> {
> @@ -646,6 +669,8 @@
> if (best != best0)
> *rt_free = best;
>
> + rte_update_unlock();
> +
> return best;
> }
>
> @@ -1097,21 +1122,6 @@
> rte_free_quick(old);
> }
>
> -static int rte_update_nest_cnt; /* Nesting counter to allow recursive
> updates */
> -
> -static inline void
> -rte_update_lock(void)
> -{
> - rte_update_nest_cnt++;
> -}
> -
> -static inline void
> -rte_update_unlock(void)
> -{
> - if (!--rte_update_nest_cnt)
> - lp_flush(rte_update_pool);
> -}
> -
> static inline void
> rte_hide_dummy_routes(net *net, rte **dummy)
> {
Looks fine :)
>
> Cheers,
> Just
> On 6 September 2016 at 18:03, Justin Cattle <j at ocado.com> wrote:
>
>> Hi Pavel,
>>
>> Thanks for quick response! I will try that as soon as I can,
>> hopefully in the next couple of days.
>> I'll report back as soon as I know.
>>
>> Cheers,
>> Just
>>
>> On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik at nic.cz>
>> wrote:
>> Hi Justin,
>>
>> On 2016-09-05 16:21, Justin Cattle wrote:
>> Hi,
>>
>> A colleague of mine reported a memory usage issue with the bird
>> daemon
>> last year, which resulted in a request for a core dump, but we never
>> followed it up.
>> I'd like to re-open this discussion and see if anything can be done
>> to
>> fix it.
>>
>> I'll provide some information regarding a production environment,
>> where the problem is most obvious. But any further details and
>> diagnostics will have to come from our lab environment.
>> Please note, in production we mostly run 1.5, but in the lab we are
>> on
>> 1.6, however we see the same symptoms in both environments on both
>> versions.
>>
>> The symptoms are twofold, but potentially related - greater than
>> expected memory usage reported by the bird daemon itself for the
>> number of routes, but also massively more memory actually used by
>> the
>> daemon process.
>>
>> When the process is started, we see "normal" memory usage, which
>> then
>> seems to grow indefinitely in distinct steps, separated by a period
>> of
>> a few hours.
>>
>> In production, this consumes most of the 32G of memory until the
>> kernel oom-killer to intervenes.
>>
>> Production:
>>
>> BIRD 1.5.0 ready.
>>
>> bird> show memory
>>
>> BIRD memory usage
>>
>> Routing tables: 1405 MB
>>
>> Route attributes: 84 kB
>>
>> ROA tables: 192 B
>>
>> Protocols: 45 kB
>>
>> Total: 1405 MB
>>
>> bird> show route count
>>
>> 2273 of 2273 routes for 1142 networks
>>
>> # ps u -p 3441
>>
>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
>> COMMAND
>>
>> bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39
>> /usr/sbin/bird -f -u bird -g bird
>>
>> ..so that's ~1.4G reported by bird, and ~18G actually consumed by
>> the
>> process.
>>
>> Lab:
>>
>> BIRD 1.6.0 ready.
>>
>> bird> show mem
>>
>> BIRD memory usage
>>
>> Routing tables: 693 MB
>>
>> Route attributes: 28 kB
>>
>> ROA tables: 192 B
>>
>> Protocols: 41 kB
>>
>> Total: 693 MB
>>
>> bird> show route count
>>
>> 175 of 175 routes for 91 networks
>>
>> # ps u -p 29085
>>
>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
>> COMMAND
>>
>> bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41
>> /usr/sbin/bird -f -u bird -g bird
>
> Thanks for this report. I successfully simulated this weird behavior
> too. The setting of kernel protocol with some export filter will cause
> memory leak bug. I prepared fixing commits in branch
> `krt-export-filtr-fix'
>
> https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1]
>
> Can you please download it and confirm, that the bug is fixed?
>
> Best,
> Pavel
>
>> ..so that's ~ 0.7G reported by bird, and ~5G actually consumed by
>> the
>> process.
>>
>> I also attached the bird config from the lab.
>>
>> Any help is much appreciated!
>> Thanks.
>>
>> Cheers,
>> Just
>> Notice: This email is confidential and may contain copyright
>> material
>> of members of the Ocado Group. Opinions and views expressed in this
>> message may not necessarily reflect the opinions and views of the
>> members of the Ocado Group.
>>
>> If you are not the intended recipient, please notify us immediately
>> and delete all copies of this message. Please note that it is your
>> responsibility to scan this message for viruses.
>>
>> Fetch and Sizzle are trading names of Speciality Stores Limited and
>> Fabled is a trading name of Marie Claire Beauty Limited, both
>> members
>> of the Ocado Group.
>>
>> References to the “Ocado Group” are to Ocado Group plc
>> (registered
>> in England and Wales with number 7098618) and its subsidiary
>> undertakings (as that expression is defined in the Companies Act
>> 2006)
>> from time to time. The registered office of Ocado Group plc is
>> Titan
>> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
>> AL10
>> 9NE.
>
> Notice: This email is confidential and may contain copyright material
> of members of the Ocado Group. Opinions and views expressed in this
> message may not necessarily reflect the opinions and views of the
> members of the Ocado Group.
>
> If you are not the intended recipient, please notify us immediately
> and delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses.
>
> Fetch and Sizzle are trading names of Speciality Stores Limited and
> Fabled is a trading name of Marie Claire Beauty Limited, both members
> of the Ocado Group.
>
> References to the “Ocado Group” are to Ocado Group plc (registered
> in England and Wales with number 7098618) and its subsidiary
> undertakings (as that expression is defined in the Companies Act 2006)
> from time to time. The registered office of Ocado Group plc is Titan
> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10
> 9NE.
>
> Links:
> ------
> [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
More information about the Bird-users
mailing list