BIRD memory usage

Pavel Tvrdík pavel.tvrdik at nic.cz
Mon Sep 12 09:16:57 CEST 2016


Hi, Justin.

On 2016-09-09 10:45, Justin Cattle wrote:
> Hi Pavel,
> 
> This is looking good for us :)
> It's been in the lab for 3 days across 25 hosts, and memory usage
> looks absolutely static after process start.
> 
> We have a couple of canary hosts in production too, and they are
> showing the same results.
> 
> Previous to installing the patched version , the process on this host
> was using about 17g of Virt Mem - now at about 80Mg, which is a nice
> optimisation ;-)

Good!

> I am planning to roll this out over production next week if possible.
> I can report back in a few weeks if you like, but it certainly seems
> like this is resolved.

A colleague Ondra Zajicek noticed me that the solution could lead to 
reading from freed memory. I made a fixup commit (d9c6d180) at the top 
of branch krt-export-filtr-fix. Please apply the commit too.

https://gitlab.labs.nic.cz/labs/bird/commit/d9c6d180e41c7246ccbde8ae4d828d87daa12cf4

It should fix the bug in the better way.

> 
> Thanks again for your help on this - we really appreciate it :)
> 

You're welcome :)

Cheers,
Pavel

> Cheers,
> Just
> On 7 September 2016 at 09:08, Pavel Tvrdík <pavel.tvrdik at nic.cz>
> wrote:
> 
>> Hi, Just.
>> 
>> On 2016-09-06 22:50, Justin Cattle wrote:
>> 
>>> I found some time to package using a patch to the latest 1.6.0
>>> release, created from a diff of origin/krt-export-filtr-fix
>>> against
>>> v1.6.0-34-g768d013  [ seems to be the top three commits ].
>> 
>> Yes, the top three commits, exactly!
>> 
>>> I hope that's valid.  That patch applied without issue, and I
>>> wrapped
>>> it into a debian patch.
>>> 
>>> I've installed on a few hosts, and I'll report back tomorrow if I
>>> get
>>> a chance.
>> 
>> Great!
>> 
>>> Thanks again for the speedy code :)
>>> 
>>> Here's my debian package patch for reference:
>>> 
>>> cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch
>>> filter/tree: prefer xmalloc/xfree to malloc/free
>>> rt-table: fix kernel protocol export filter memory bug
>>> Index: bird-1.6.0/filter/tree.c
>>> 
>> ===================================================================
>>> --- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000
>>> +0000
>>> +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100
>>> @@ -82,7 +82,7 @@
>>> if (len <= 1024)
>>> buf = alloca(len * sizeof(struct f_tree *));
>>> else
>>> -    buf = malloc(len * sizeof(struct f_tree *));
>>> +    buf = xmalloc(len * sizeof(struct f_tree *));
>>> 
>>> /* Convert a degenerated tree into an sorted array */
>>> i = 0;
>>> @@ -94,7 +94,7 @@
>>> root = build_tree_rec(buf, 0, len);
>>> 
>>> if (len > 1024)
>>> -    free(buf);
>>> +    xfree(buf);
>>> 
>>> return root;
>>> }
>>> Index: bird-1.6.0/nest/rt-table.c
>>> 
>> ===================================================================
>>> --- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000
>>> +0100
>>> +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100
>>> @@ -60,6 +60,21 @@
>>> static inline void rt_schedule_prune(rtable *tab);
>>> 
>>> +static int rte_update_nest_cnt; /* Nesting counter to allow
>>> recursive
>>> updates */
>>> +
>>> +static inline void
>>> +rte_update_lock(void)
>>> +{
>>> +  rte_update_nest_cnt++;
>>> +}
>>> +
>>> +static inline void
>>> +rte_update_unlock(void)
>>> +{
>>> +  if (!--rte_update_nest_cnt)
>>> +    lp_flush(rte_update_pool);
>>> +}
>>> +
>>> static inline struct ea_list *
>>> make_tmp_attrs(struct rte *rt, struct linpool *pool)
>>> {
>>> @@ -609,10 +624,18 @@
>>> if (!rte_is_valid(best0))
>>> return NULL;
>>> 
>>> +  /* This non-static function could be called from outside
>>> rt-table.c
>>> file and
>>> +   * we need to ensure that a temporary allocated linpool memory
>>> @rte_update_pool
>>> +   * will be freed */
>>> +  rte_update_lock();
>>> +
>>> best = export_filter(ah, best0, rt_free, tmpa, silent);
>>> 
>>> if (!best || !rte_is_reachable(best))
>>> +  {
>>> +    rte_update_unlock();
>>> return best;
>>> +  }
>>> 
>>> for (rt0 = best0->next; rt0; rt0 = rt0->next)
>>> {
>>> @@ -646,6 +669,8 @@
>>> if (best != best0)
>>> *rt_free = best;
>>> 
>>> +  rte_update_unlock();
>>> +
>>> return best;
>>> }
>>> 
>>> @@ -1097,21 +1122,6 @@
>>> rte_free_quick(old);
>>> }
>>> 
>>> -static int rte_update_nest_cnt; /* Nesting counter to allow
>>> recursive
>>> updates */
>>> -
>>> -static inline void
>>> -rte_update_lock(void)
>>> -{
>>> -  rte_update_nest_cnt++;
>>> -}
>>> -
>>> -static inline void
>>> -rte_update_unlock(void)
>>> -{
>>> -  if (!--rte_update_nest_cnt)
>>> -    lp_flush(rte_update_pool);
>>> -}
>>> -
>>> static inline void
>>> rte_hide_dummy_routes(net *net, rte **dummy)
>>> {
>> 
>> Looks fine :)
>> 
>> Cheers,
>> Just
>> On 6 September 2016 at 18:03, Justin Cattle <j at ocado.com> wrote:
>> 
>> Hi Pavel,
>> 
>> Thanks for quick response! I will try that as soon as I can,
>> hopefully in the next couple of days.
>> I'll report back as soon as I know.
>> 
>> Cheers,
>> Just
>> 
>> On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik at nic.cz>
>> wrote:
>> Hi Justin,
>> 
>> On 2016-09-05 16:21, Justin Cattle wrote:
>> Hi,
>> 
>> A colleague of mine reported a memory usage issue with the bird
>> daemon
>> last year, which resulted in a request for a core dump, but we never
>> followed it up.
>> I'd like to re-open this discussion and see if anything can be done
>> to
>> fix it.
>> 
>> I'll provide some information regarding a production environment,
>> where the problem is most obvious.  But any further details and
>> diagnostics will have to come from our lab environment.
>> Please note, in production we mostly run 1.5, but in the lab we are
>> on
>> 1.6, however we see the same symptoms in both environments on both
>> versions.
>> 
>> The symptoms are twofold, but potentially related -  greater than
>> expected memory usage reported by the bird daemon itself for the
>> number of routes, but also massively more memory actually used by
>> the
>> daemon process.
>> 
>> When the process is started, we see "normal" memory usage, which
>> then
>> seems to grow indefinitely in distinct steps, separated by a period
>> of
>> a few hours.
>> 
>> In production, this consumes most of the 32G of memory until the
>> kernel oom-killer to intervenes.
>> 
>> Production:
>> 
>> BIRD 1.5.0 ready.
>> 
>> bird> show memory
>> 
>> BIRD memory usage
>> 
>> Routing tables:   1405 MB
>> 
>> Route attributes:   84 kB
>> 
>> ROA tables:        192  B
>> 
>> Protocols:          45 kB
>> 
>> Total:            1405 MB
>> 
>> bird> show route count
>> 
>> 2273 of 2273 routes for 1142 networks
>> 
>> # ps u  -p 3441
>> 
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
>> COMMAND
>> 
>> bird      3441  0.1 55.4 18275124 18241540 ?   Ssl  Aug10  73:39
>> /usr/sbin/bird -f -u bird -g bird
>> 
>> ..so that's ~1.4G reported by bird, and ~18G actually consumed by
>> the
>> process.
>> 
>> Lab:
>> 
>> BIRD 1.6.0 ready.
>> 
>> bird> show mem
>> 
>> BIRD memory usage
>> 
>> Routing tables:    693 MB
>> 
>> Route attributes:   28 kB
>> 
>> ROA tables:        192  B
>> 
>> Protocols:          41 kB
>> 
>> Total:             693 MB
>> 
>> bird> show route count
>> 
>> 175 of 175 routes for 91 networks
>> 
>> # ps u -p 29085
>> 
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
>> COMMAND
>> 
>> bird     29085  0.0 14.9 4994852 4915032 ?     Ssl  Aug05  19:41
>> /usr/sbin/bird -f -u bird -g bird
>> 
>> Thanks for this report. I successfully simulated this weird
>> behavior
>> too. The setting of kernel protocol with some export filter will
>> cause
>> memory leak bug. I prepared fixing commits in branch
>> `krt-export-filtr-fix'
>> 
>> https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
>> [1] [1]
>> 
>> Can you please download it and confirm, that the bug is fixed?
>> 
>> Best,
>> Pavel
>> 
>> ..so that's ~ 0.7G reported by bird, and ~5G actually consumed by
>> the
>> process.
>> 
>> I also attached the bird config from the lab.
>> 
>> Any help is much appreciated!
>> Thanks.
>> 
>> Cheers,
>> Just
>> Notice:  This email is confidential and may contain copyright
>> material
>> of members of the Ocado Group. Opinions and views expressed in this
>> message may not necessarily reflect the opinions and views of the
>> members of the Ocado Group.
>> 
>> If you are not the intended recipient, please notify us immediately
>> and delete all copies of this message. Please note that it is your
>> responsibility to scan this message for viruses.
>> 
>> Fetch and Sizzle are trading names of Speciality Stores Limited and
>> Fabled is a trading name of Marie Claire Beauty Limited, both
>> members
>> of the Ocado Group.
>> 
>> References to the “Ocado Group” are to Ocado Group plc
>> (registered
>> in England and Wales with number 7098618) and its subsidiary
>> undertakings (as that expression is defined in the Companies Act
>> 2006)
>> from time to time.  The registered office of Ocado Group plc is
>> Titan
>> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
>> AL10
>> 9NE.
>> 
>> Notice:  This email is confidential and may contain copyright
>> material
>> of members of the Ocado Group. Opinions and views expressed in this
>> message may not necessarily reflect the opinions and views of the
>> members of the Ocado Group.
>> 
>> If you are not the intended recipient, please notify us immediately
>> and delete all copies of this message. Please note that it is your
>> responsibility to scan this message for viruses.
>> 
>> Fetch and Sizzle are trading names of Speciality Stores Limited and
>> Fabled is a trading name of Marie Claire Beauty Limited, both
>> members
>> of the Ocado Group.
>> 
>> References to the “Ocado Group” are to Ocado Group plc
>> (registered
>> in England and Wales with number 7098618) and its subsidiary
>> undertakings (as that expression is defined in the Companies Act
>> 2006)
>> from time to time.  The registered office of Ocado Group plc is
>> Titan
>> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
>> AL10
>> 9NE.
>> 
>> Links:
>> ------
>> [1]
>> https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
>> [1]
> 
> Notice:  This email is confidential and may contain copyright material
> of members of the Ocado Group. Opinions and views expressed in this
> message may not necessarily reflect the opinions and views of the
> members of the Ocado Group.
> 
> If you are not the intended recipient, please notify us immediately
> and delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses.
> 
> Fetch and Sizzle are trading names of Speciality Stores Limited and
> Fabled is a trading name of Marie Claire Beauty Limited, both members
> of the Ocado Group.
> 
> References to the “Ocado Group” are to Ocado Group plc (registered
> in England and Wales with number 7098618) and its subsidiary
> undertakings (as that expression is defined in the Companies Act 2006)
> from time to time.  The registered office of Ocado Group plc is Titan
> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10
> 9NE.
> 
> Links:
> ------
> [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix


More information about the Bird-users mailing list