Hi, A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it. I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions. The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process. When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours. In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes. Production: BIRD 1.5.0 ready. bird> show memory BIRD memory usage Routing tables: 1405 MB Route attributes: 84 kB ROA tables: 192 B Protocols: 45 kB Total: 1405 MB bird> show route count 2273 of 2273 routes for 1142 networks # ps u -p 3441 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird ..so that's ~1.4G reported by bird, and ~18G actually consumed by the process. Lab: BIRD 1.6.0 ready. bird> show mem BIRD memory usage Routing tables: 693 MB Route attributes: 28 kB ROA tables: 192 B Protocols: 41 kB Total: 693 MB bird> show route count 175 of 175 routes for 91 networks # ps u -p 29085 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird ..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process. I also attached the bird config from the lab. Any help is much appreciated! Thanks. Cheers, Just -- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Hi Justin, On 2016-09-05 16:21, Justin Cattle wrote:
Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix' https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix Can you please download it and confirm, that the bug is fixed? Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Hi Pavel, Thanks for quick response! I will try that as soon as I can, hopefully in the next couple of days. I'll report back as soon as I know. Cheers, Just On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote:
Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
Can you please download it and confirm, that the bug is fixed?
Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
I found some time to package using a patch to the latest 1.6.0 release, created from a diff of origin/krt-export-filtr-fix against v1.6.0-34-g768d013 [ seems to be the top three commits ]. I hope that's valid. That patch applied without issue, and I wrapped it into a debian patch. I've installed on a few hosts, and I'll report back tomorrow if I get a chance. Thanks again for the speedy code :) Here's my debian package patch for reference: cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch filter/tree: prefer xmalloc/xfree to malloc/free rt-table: fix kernel protocol export filter memory bug Index: bird-1.6.0/filter/tree.c =================================================================== --- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000 +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100 @@ -82,7 +82,7 @@ if (len <= 1024) buf = alloca(len * sizeof(struct f_tree *)); else - buf = malloc(len * sizeof(struct f_tree *)); + buf = xmalloc(len * sizeof(struct f_tree *)); /* Convert a degenerated tree into an sorted array */ i = 0; @@ -94,7 +94,7 @@ root = build_tree_rec(buf, 0, len); if (len > 1024) - free(buf); + xfree(buf); return root; } Index: bird-1.6.0/nest/rt-table.c =================================================================== --- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000 +0100 +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100 @@ -60,6 +60,21 @@ static inline void rt_schedule_prune(rtable *tab); +static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ + +static inline void +rte_update_lock(void) +{ + rte_update_nest_cnt++; +} + +static inline void +rte_update_unlock(void) +{ + if (!--rte_update_nest_cnt) + lp_flush(rte_update_pool); +} + static inline struct ea_list * make_tmp_attrs(struct rte *rt, struct linpool *pool) { @@ -609,10 +624,18 @@ if (!rte_is_valid(best0)) return NULL; + /* This non-static function could be called from outside rt-table.c file and + * we need to ensure that a temporary allocated linpool memory @rte_update_pool + * will be freed */ + rte_update_lock(); + best = export_filter(ah, best0, rt_free, tmpa, silent); if (!best || !rte_is_reachable(best)) + { + rte_update_unlock(); return best; + } for (rt0 = best0->next; rt0; rt0 = rt0->next) { @@ -646,6 +669,8 @@ if (best != best0) *rt_free = best; + rte_update_unlock(); + return best; } @@ -1097,21 +1122,6 @@ rte_free_quick(old); } -static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ - -static inline void -rte_update_lock(void) -{ - rte_update_nest_cnt++; -} - -static inline void -rte_update_unlock(void) -{ - if (!--rte_update_nest_cnt) - lp_flush(rte_update_pool); -} - static inline void rte_hide_dummy_routes(net *net, rte **dummy) { Cheers, Just On 6 September 2016 at 18:03, Justin Cattle <j@ocado.com> wrote:
Hi Pavel,
Thanks for quick response! I will try that as soon as I can, hopefully in the next couple of days. I'll report back as soon as I know.
Cheers, Just
On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote:
Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
Can you please download it and confirm, that the bug is fixed?
Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Hi, Just. On 2016-09-06 22:50, Justin Cattle wrote:
I found some time to package using a patch to the latest 1.6.0 release, created from a diff of origin/krt-export-filtr-fix against v1.6.0-34-g768d013 [ seems to be the top three commits ].
Yes, the top three commits, exactly!
I hope that's valid. That patch applied without issue, and I wrapped it into a debian patch.
I've installed on a few hosts, and I'll report back tomorrow if I get a chance.
Great!
Thanks again for the speedy code :)
Here's my debian package patch for reference:
cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch filter/tree: prefer xmalloc/xfree to malloc/free rt-table: fix kernel protocol export filter memory bug Index: bird-1.6.0/filter/tree.c =================================================================== --- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000 +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100 @@ -82,7 +82,7 @@ if (len <= 1024) buf = alloca(len * sizeof(struct f_tree *)); else - buf = malloc(len * sizeof(struct f_tree *)); + buf = xmalloc(len * sizeof(struct f_tree *));
/* Convert a degenerated tree into an sorted array */ i = 0; @@ -94,7 +94,7 @@ root = build_tree_rec(buf, 0, len);
if (len > 1024) - free(buf); + xfree(buf);
return root; } Index: bird-1.6.0/nest/rt-table.c =================================================================== --- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000 +0100 +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100 @@ -60,6 +60,21 @@ static inline void rt_schedule_prune(rtable *tab);
+static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ + +static inline void +rte_update_lock(void) +{ + rte_update_nest_cnt++; +} + +static inline void +rte_update_unlock(void) +{ + if (!--rte_update_nest_cnt) + lp_flush(rte_update_pool); +} + static inline struct ea_list * make_tmp_attrs(struct rte *rt, struct linpool *pool) { @@ -609,10 +624,18 @@ if (!rte_is_valid(best0)) return NULL;
+ /* This non-static function could be called from outside rt-table.c file and + * we need to ensure that a temporary allocated linpool memory @rte_update_pool + * will be freed */ + rte_update_lock(); + best = export_filter(ah, best0, rt_free, tmpa, silent);
if (!best || !rte_is_reachable(best)) + { + rte_update_unlock(); return best; + }
for (rt0 = best0->next; rt0; rt0 = rt0->next) { @@ -646,6 +669,8 @@ if (best != best0) *rt_free = best;
+ rte_update_unlock(); + return best; }
@@ -1097,21 +1122,6 @@ rte_free_quick(old); }
-static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ - -static inline void -rte_update_lock(void) -{ - rte_update_nest_cnt++; -} - -static inline void -rte_update_unlock(void) -{ - if (!--rte_update_nest_cnt) - lp_flush(rte_update_pool); -} - static inline void rte_hide_dummy_routes(net *net, rte **dummy) {
Looks fine :)
Cheers, Just On 6 September 2016 at 18:03, Justin Cattle <j@ocado.com> wrote:
Hi Pavel,
Thanks for quick response! I will try that as soon as I can, hopefully in the next couple of days. I'll report back as soon as I know.
Cheers, Just
On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote: Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote: Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1]
Can you please download it and confirm, that the bug is fixed?
Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
Hi Pavel, This is looking good for us :) It's been in the lab for 3 days across 25 hosts, and memory usage looks absolutely static after process start. We have a couple of canary hosts in production too, and they are showing the same results. Example stats: # birdc show route count BIRD 1.6.0 ready. 2391 of 2391 routes for 1201 networks # birdc show mem BIRD 1.6.0 ready. BIRD memory usage Routing tables: 246 kB Route attributes: 88 kB ROA tables: 192 B Protocols: 45 kB Total: 416 kB # pmap $(pgrep "bird$") |grep total total 81500K Previous to installing the patched version , the process on this host was using about 17g of Virt Mem - now at about 80Mg, which is a nice optimisation ;-) I am planning to roll this out over production next week if possible. I can report back in a few weeks if you like, but it certainly seems like this is resolved. Thanks again for your help on this - we really appreciate it :) Cheers, Just On 7 September 2016 at 09:08, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi, Just.
On 2016-09-06 22:50, Justin Cattle wrote:
I found some time to package using a patch to the latest 1.6.0 release, created from a diff of origin/krt-export-filtr-fix against v1.6.0-34-g768d013 [ seems to be the top three commits ].
Yes, the top three commits, exactly!
I hope that's valid. That patch applied without issue, and I wrapped
it into a debian patch.
I've installed on a few hosts, and I'll report back tomorrow if I get a chance.
Great!
Thanks again for the speedy code :)
Here's my debian package patch for reference:
cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch filter/tree: prefer xmalloc/xfree to malloc/free rt-table: fix kernel protocol export filter memory bug Index: bird-1.6.0/filter/tree.c =================================================================== --- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000 +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100 @@ -82,7 +82,7 @@ if (len <= 1024) buf = alloca(len * sizeof(struct f_tree *)); else - buf = malloc(len * sizeof(struct f_tree *)); + buf = xmalloc(len * sizeof(struct f_tree *));
/* Convert a degenerated tree into an sorted array */ i = 0; @@ -94,7 +94,7 @@ root = build_tree_rec(buf, 0, len);
if (len > 1024) - free(buf); + xfree(buf);
return root; } Index: bird-1.6.0/nest/rt-table.c =================================================================== --- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000 +0100 +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100 @@ -60,6 +60,21 @@ static inline void rt_schedule_prune(rtable *tab);
+static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ + +static inline void +rte_update_lock(void) +{ + rte_update_nest_cnt++; +} + +static inline void +rte_update_unlock(void) +{ + if (!--rte_update_nest_cnt) + lp_flush(rte_update_pool); +} + static inline struct ea_list * make_tmp_attrs(struct rte *rt, struct linpool *pool) { @@ -609,10 +624,18 @@ if (!rte_is_valid(best0)) return NULL;
+ /* This non-static function could be called from outside rt-table.c file and + * we need to ensure that a temporary allocated linpool memory @rte_update_pool + * will be freed */ + rte_update_lock(); + best = export_filter(ah, best0, rt_free, tmpa, silent);
if (!best || !rte_is_reachable(best)) + { + rte_update_unlock(); return best; + }
for (rt0 = best0->next; rt0; rt0 = rt0->next) { @@ -646,6 +669,8 @@ if (best != best0) *rt_free = best;
+ rte_update_unlock(); + return best; }
@@ -1097,21 +1122,6 @@ rte_free_quick(old); }
-static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ - -static inline void -rte_update_lock(void) -{ - rte_update_nest_cnt++; -} - -static inline void -rte_update_unlock(void) -{ - if (!--rte_update_nest_cnt) - lp_flush(rte_update_pool); -} - static inline void rte_hide_dummy_routes(net *net, rte **dummy) {
Looks fine :)
Cheers, Just On 6 September 2016 at 18:03, Justin Cattle <j@ocado.com> wrote:
Hi Pavel,
Thanks for quick response! I will try that as soon as I can, hopefully in the next couple of days. I'll report back as soon as I know.
Cheers, Just
On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote: Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote: Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1]
Can you please download it and confirm, that the bug is fixed?
Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by
the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Hi, Justin. On 2016-09-09 10:45, Justin Cattle wrote:
Hi Pavel,
This is looking good for us :) It's been in the lab for 3 days across 25 hosts, and memory usage looks absolutely static after process start.
We have a couple of canary hosts in production too, and they are showing the same results.
Previous to installing the patched version , the process on this host was using about 17g of Virt Mem - now at about 80Mg, which is a nice optimisation ;-)
Good!
I am planning to roll this out over production next week if possible. I can report back in a few weeks if you like, but it certainly seems like this is resolved.
A colleague Ondra Zajicek noticed me that the solution could lead to reading from freed memory. I made a fixup commit (d9c6d180) at the top of branch krt-export-filtr-fix. Please apply the commit too. https://gitlab.labs.nic.cz/labs/bird/commit/d9c6d180e41c7246ccbde8ae4d828d87... It should fix the bug in the better way.
Thanks again for your help on this - we really appreciate it :)
You're welcome :) Cheers, Pavel
Cheers, Just On 7 September 2016 at 09:08, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi, Just.
On 2016-09-06 22:50, Justin Cattle wrote:
I found some time to package using a patch to the latest 1.6.0 release, created from a diff of origin/krt-export-filtr-fix against v1.6.0-34-g768d013 [ seems to be the top three commits ].
Yes, the top three commits, exactly!
I hope that's valid. That patch applied without issue, and I wrapped it into a debian patch.
I've installed on a few hosts, and I'll report back tomorrow if I get a chance.
Great!
Thanks again for the speedy code :)
Here's my debian package patch for reference:
cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch filter/tree: prefer xmalloc/xfree to malloc/free rt-table: fix kernel protocol export filter memory bug Index: bird-1.6.0/filter/tree.c
===================================================================
--- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000 +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100 @@ -82,7 +82,7 @@ if (len <= 1024) buf = alloca(len * sizeof(struct f_tree *)); else - buf = malloc(len * sizeof(struct f_tree *)); + buf = xmalloc(len * sizeof(struct f_tree *));
/* Convert a degenerated tree into an sorted array */ i = 0; @@ -94,7 +94,7 @@ root = build_tree_rec(buf, 0, len);
if (len > 1024) - free(buf); + xfree(buf);
return root; } Index: bird-1.6.0/nest/rt-table.c
===================================================================
--- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000 +0100 +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100 @@ -60,6 +60,21 @@ static inline void rt_schedule_prune(rtable *tab);
+static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ + +static inline void +rte_update_lock(void) +{ + rte_update_nest_cnt++; +} + +static inline void +rte_update_unlock(void) +{ + if (!--rte_update_nest_cnt) + lp_flush(rte_update_pool); +} + static inline struct ea_list * make_tmp_attrs(struct rte *rt, struct linpool *pool) { @@ -609,10 +624,18 @@ if (!rte_is_valid(best0)) return NULL;
+ /* This non-static function could be called from outside rt-table.c file and + * we need to ensure that a temporary allocated linpool memory @rte_update_pool + * will be freed */ + rte_update_lock(); + best = export_filter(ah, best0, rt_free, tmpa, silent);
if (!best || !rte_is_reachable(best)) + { + rte_update_unlock(); return best; + }
for (rt0 = best0->next; rt0; rt0 = rt0->next) { @@ -646,6 +669,8 @@ if (best != best0) *rt_free = best;
+ rte_update_unlock(); + return best; }
@@ -1097,21 +1122,6 @@ rte_free_quick(old); }
-static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ - -static inline void -rte_update_lock(void) -{ - rte_update_nest_cnt++; -} - -static inline void -rte_update_unlock(void) -{ - if (!--rte_update_nest_cnt) - lp_flush(rte_update_pool); -} - static inline void rte_hide_dummy_routes(net *net, rte **dummy) {
Looks fine :)
Cheers, Just On 6 September 2016 at 18:03, Justin Cattle <j@ocado.com> wrote:
Hi Pavel,
Thanks for quick response! I will try that as soon as I can, hopefully in the next couple of days. I'll report back as soon as I know.
Cheers, Just
On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote: Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote: Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1] [1]
Can you please download it and confirm, that the bug is fixed?
Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1]
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
Thanks Pavel - I have updated our package and rolled this version out where the previous new package was. As it's a new version, I will leave it a few more days before deploying everywhere now. Cheers, Just On 12 September 2016 at 08:16, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi, Justin.
On 2016-09-09 10:45, Justin Cattle wrote:
Hi Pavel,
This is looking good for us :) It's been in the lab for 3 days across 25 hosts, and memory usage looks absolutely static after process start.
We have a couple of canary hosts in production too, and they are showing the same results.
Previous to installing the patched version , the process on this host was using about 17g of Virt Mem - now at about 80Mg, which is a nice optimisation ;-)
Good!
I am planning to roll this out over production next week if possible.
I can report back in a few weeks if you like, but it certainly seems like this is resolved.
A colleague Ondra Zajicek noticed me that the solution could lead to reading from freed memory. I made a fixup commit (d9c6d180) at the top of branch krt-export-filtr-fix. Please apply the commit too.
https://gitlab.labs.nic.cz/labs/bird/commit/d9c6d180e41c7246 ccbde8ae4d828d87daa12cf4
It should fix the bug in the better way.
Thanks again for your help on this - we really appreciate it :)
You're welcome :)
Cheers, Pavel
Cheers,
Just On 7 September 2016 at 09:08, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi, Just.
On 2016-09-06 22:50, Justin Cattle wrote:
I found some time to package using a patch to the latest 1.6.0
release, created from a diff of origin/krt-export-filtr-fix against v1.6.0-34-g768d013 [ seems to be the top three commits ].
Yes, the top three commits, exactly!
I hope that's valid. That patch applied without issue, and I
wrapped it into a debian patch.
I've installed on a few hosts, and I'll report back tomorrow if I get a chance.
Great!
Thanks again for the speedy code :)
Here's my debian package patch for reference:
cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch filter/tree: prefer xmalloc/xfree to malloc/free rt-table: fix kernel protocol export filter memory bug Index: bird-1.6.0/filter/tree.c
===================================================================
--- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000 +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100 @@ -82,7 +82,7 @@ if (len <= 1024) buf = alloca(len * sizeof(struct f_tree *)); else - buf = malloc(len * sizeof(struct f_tree *)); + buf = xmalloc(len * sizeof(struct f_tree *));
/* Convert a degenerated tree into an sorted array */ i = 0; @@ -94,7 +94,7 @@ root = build_tree_rec(buf, 0, len);
if (len > 1024) - free(buf); + xfree(buf);
return root; } Index: bird-1.6.0/nest/rt-table.c
===================================================================
--- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000 +0100 +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100 @@ -60,6 +60,21 @@ static inline void rt_schedule_prune(rtable *tab);
+static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ + +static inline void +rte_update_lock(void) +{ + rte_update_nest_cnt++; +} + +static inline void +rte_update_unlock(void) +{ + if (!--rte_update_nest_cnt) + lp_flush(rte_update_pool); +} + static inline struct ea_list * make_tmp_attrs(struct rte *rt, struct linpool *pool) { @@ -609,10 +624,18 @@ if (!rte_is_valid(best0)) return NULL;
+ /* This non-static function could be called from outside rt-table.c file and + * we need to ensure that a temporary allocated linpool memory @rte_update_pool + * will be freed */ + rte_update_lock(); + best = export_filter(ah, best0, rt_free, tmpa, silent);
if (!best || !rte_is_reachable(best)) + { + rte_update_unlock(); return best; + }
for (rt0 = best0->next; rt0; rt0 = rt0->next) { @@ -646,6 +669,8 @@ if (best != best0) *rt_free = best;
+ rte_update_unlock(); + return best; }
@@ -1097,21 +1122,6 @@ rte_free_quick(old); }
-static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ - -static inline void -rte_update_lock(void) -{ - rte_update_nest_cnt++; -} - -static inline void -rte_update_unlock(void) -{ - if (!--rte_update_nest_cnt) - lp_flush(rte_update_pool); -} - static inline void rte_hide_dummy_routes(net *net, rte **dummy) {
Looks fine :)
Cheers, Just On 6 September 2016 at 18:03, Justin Cattle <j@ocado.com> wrote:
Hi Pavel,
Thanks for quick response! I will try that as soon as I can, hopefully in the next couple of days. I'll report back as soon as I know.
Cheers, Just
On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote: Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote: Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1] [1]
Can you please download it and confirm, that the bug is fixed?
Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1]
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Hi Pavel, After running with this latest fixup commit for a week, I see mixed results. With the first fix you created, all the processes remained using a very small amount of memory, consistently. As per my previous email, around 80Mg. With the second fix, some of the bird processes are using up to about 600Mg, but some are still using more like the 80Mg from the first fix. And, there is a mixture in between those two extremes. So my question is - is this normal and expected now, of is there a potential issue with the second fix? Here is an example range of memory usage, the hosts all have broadly the same route counts, but some advertise and withdraw more routes than others [ although that itself doesn't seem to be related to the size].. 0.08 GB 0.08 GB 0.08 GB 0.08 GB 0.08 GB 0.08 GB 0.09 GB 0.10 GB 0.10 GB 0.10 GB 0.11 GB 0.11 GB 0.11 GB 0.16 GB 0.61 GB 0.61 GB 0.61 GB 0.61 GB 0.61 GB 0.62 GB 0.62 GB 0.62 GB 0.62 GB 0.63 GB 0.63 GB Thanks! Cheers, Just On 12 September 2016 at 11:29, Justin Cattle <j@ocado.com> wrote:
Thanks Pavel - I have updated our package and rolled this version out where the previous new package was.
As it's a new version, I will leave it a few more days before deploying everywhere now.
Cheers, Just
On 12 September 2016 at 08:16, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi, Justin.
On 2016-09-09 10:45, Justin Cattle wrote:
Hi Pavel,
This is looking good for us :) It's been in the lab for 3 days across 25 hosts, and memory usage looks absolutely static after process start.
We have a couple of canary hosts in production too, and they are showing the same results.
Previous to installing the patched version , the process on this host was using about 17g of Virt Mem - now at about 80Mg, which is a nice optimisation ;-)
Good!
I am planning to roll this out over production next week if possible.
I can report back in a few weeks if you like, but it certainly seems like this is resolved.
A colleague Ondra Zajicek noticed me that the solution could lead to reading from freed memory. I made a fixup commit (d9c6d180) at the top of branch krt-export-filtr-fix. Please apply the commit too.
https://gitlab.labs.nic.cz/labs/bird/commit/d9c6d180e41c7246 ccbde8ae4d828d87daa12cf4
It should fix the bug in the better way.
Thanks again for your help on this - we really appreciate it :)
You're welcome :)
Cheers, Pavel
Cheers,
Just On 7 September 2016 at 09:08, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote:
Hi, Just.
On 2016-09-06 22:50, Justin Cattle wrote:
I found some time to package using a patch to the latest 1.6.0
release, created from a diff of origin/krt-export-filtr-fix against v1.6.0-34-g768d013 [ seems to be the top three commits ].
Yes, the top three commits, exactly!
I hope that's valid. That patch applied without issue, and I
wrapped it into a debian patch.
I've installed on a few hosts, and I'll report back tomorrow if I get a chance.
Great!
Thanks again for the speedy code :)
Here's my debian package patch for reference:
cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch filter/tree: prefer xmalloc/xfree to malloc/free rt-table: fix kernel protocol export filter memory bug Index: bird-1.6.0/filter/tree.c
===================================================================
--- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000 +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100 @@ -82,7 +82,7 @@ if (len <= 1024) buf = alloca(len * sizeof(struct f_tree *)); else - buf = malloc(len * sizeof(struct f_tree *)); + buf = xmalloc(len * sizeof(struct f_tree *));
/* Convert a degenerated tree into an sorted array */ i = 0; @@ -94,7 +94,7 @@ root = build_tree_rec(buf, 0, len);
if (len > 1024) - free(buf); + xfree(buf);
return root; } Index: bird-1.6.0/nest/rt-table.c
===================================================================
--- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000 +0100 +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100 @@ -60,6 +60,21 @@ static inline void rt_schedule_prune(rtable *tab);
+static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ + +static inline void +rte_update_lock(void) +{ + rte_update_nest_cnt++; +} + +static inline void +rte_update_unlock(void) +{ + if (!--rte_update_nest_cnt) + lp_flush(rte_update_pool); +} + static inline struct ea_list * make_tmp_attrs(struct rte *rt, struct linpool *pool) { @@ -609,10 +624,18 @@ if (!rte_is_valid(best0)) return NULL;
+ /* This non-static function could be called from outside rt-table.c file and + * we need to ensure that a temporary allocated linpool memory @rte_update_pool + * will be freed */ + rte_update_lock(); + best = export_filter(ah, best0, rt_free, tmpa, silent);
if (!best || !rte_is_reachable(best)) + { + rte_update_unlock(); return best; + }
for (rt0 = best0->next; rt0; rt0 = rt0->next) { @@ -646,6 +669,8 @@ if (best != best0) *rt_free = best;
+ rte_update_unlock(); + return best; }
@@ -1097,21 +1122,6 @@ rte_free_quick(old); }
-static int rte_update_nest_cnt; /* Nesting counter to allow recursive updates */ - -static inline void -rte_update_lock(void) -{ - rte_update_nest_cnt++; -} - -static inline void -rte_update_unlock(void) -{ - if (!--rte_update_nest_cnt) - lp_flush(rte_update_pool); -} - static inline void rte_hide_dummy_routes(net *net, rte **dummy) {
Looks fine :)
Cheers, Just On 6 September 2016 at 18:03, Justin Cattle <j@ocado.com> wrote:
Hi Pavel,
Thanks for quick response! I will try that as soon as I can, hopefully in the next couple of days. I'll report back as soon as I know.
Cheers, Just
On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik@nic.cz> wrote: Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote: Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
The symptoms are twofold, but potentially related - greater than expected memory usage reported by the bird daemon itself for the number of routes, but also massively more memory actually used by the daemon process.
When the process is started, we see "normal" memory usage, which then seems to grow indefinitely in distinct steps, separated by a period of a few hours.
In production, this consumes most of the 32G of memory until the kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39 /usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by the process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41 /usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird behavior too. The setting of kernel protocol with some export filter will cause memory leak bug. I prepared fixing commits in branch `krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1] [1]
Can you please download it and confirm, that the bug is fixed?
Best, Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by the process.
I also attached the bird config from the lab.
Any help is much appreciated! Thanks.
Cheers, Just Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1]
Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group.
If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
Links: ------ [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
On Mon, Sep 19, 2016 at 09:46:03AM +0100, Justin Cattle wrote:
Hi Pavel,
After running with this latest fixup commit for a week, I see mixed results.
With the first fix you created, all the processes remained using a very small amount of memory, consistently. As per my previous email, around 80Mg. With the second fix, some of the bird processes are using up to about 600Mg, but some are still using more like the 80Mg from the first fix. And, there is a mixture in between those two extremes.
So my question is - is this normal and expected now, of is there a potential issue with the second fix?
Hi We found that there was one minor leak that was overlooked in the second fix. You can try attached patch v3. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ok - great. Should this patch apply to the 1.6 released version ok ? I was tracking from the krt-export-filtr-fix before, that that is now gone :) Cheers, Just On 19 September 2016 at 10:13, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Mon, Sep 19, 2016 at 09:46:03AM +0100, Justin Cattle wrote:
Hi Pavel,
After running with this latest fixup commit for a week, I see mixed results.
With the first fix you created, all the processes remained using a very small amount of memory, consistently. As per my previous email, around 80Mg. With the second fix, some of the bird processes are using up to about 600Mg, but some are still using more like the 80Mg from the first fix. And, there is a mixture in between those two extremes.
So my question is - is this normal and expected now, of is there a potential issue with the second fix?
Hi
We found that there was one minor leak that was overlooked in the second fix.
You can try attached patch v3.
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
I *think* I have answered my own question. The patch in the email doesn't include the switch to xmalloc that was originally in the krt-export-filtr-fix branch as well. I can see from `git blame` that it's in the previous commit, bc00f058154bb4a630d24d64a55b5f181d235c63 [ Filter: Prefer xmalloc/xfree to malloc/free ]. So, it looks like I actually need a290da25a16b7c79d4a7a87f522b4068bca04979 and bc00f058154bb4a630d24d64a55b5f181d235c63. Can you please confirm this is correct ? Or else advise the best way to patch against the current release ? Cheers, Just On 19 September 2016 at 10:41, Justin Cattle <j@ocado.com> wrote:
Ok - great.
Should this patch apply to the 1.6 released version ok ? I was tracking from the krt-export-filtr-fix before, that that is now gone :)
Cheers, Just
On 19 September 2016 at 10:13, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Mon, Sep 19, 2016 at 09:46:03AM +0100, Justin Cattle wrote:
Hi Pavel,
After running with this latest fixup commit for a week, I see mixed results.
With the first fix you created, all the processes remained using a very small amount of memory, consistently. As per my previous email, around 80Mg. With the second fix, some of the bird processes are using up to about 600Mg, but some are still using more like the 80Mg from the first fix. And, there is a mixture in between those two extremes.
So my question is - is this normal and expected now, of is there a potential issue with the second fix?
Hi
We found that there was one minor leak that was overlooked in the second fix.
You can try attached patch v3.
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
On Mon, Sep 19, 2016 at 11:04:29AM +0100, Justin Cattle wrote:
I *think* I have answered my own question. The patch in the email doesn't include the switch to xmalloc that was originally in the krt-export-filtr-fix branch as well.
I can see from `git blame` that it's in the previous commit, bc00f058154bb4a630d24d64a55b5f181d235c63 [ Filter: Prefer xmalloc/xfree to malloc/free ].
This patch is irrelevant to the problem (although it was in the krt-export-filtr-fix branch). You can skip it.
Or else advise the best way to patch against the current release ?
You could use just a290da25a16b7c79d4a7a87f522b4068bca04979 or you can use whole master branch from git (where it is merged). -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Roger that. I've deployed a new version with the patch from a290da25a16b7c79d4a7a87f522b4068bca04979 - I'll leave it for a few days and report back. Please let me know if you think there are any further issues, or any subsequent extra patches on top that are relevant. Thanks! Cheers, Just On 19 September 2016 at 11:16, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Mon, Sep 19, 2016 at 11:04:29AM +0100, Justin Cattle wrote:
I *think* I have answered my own question. The patch in the email doesn't include the switch to xmalloc that was originally in the krt-export-filtr-fix branch as well.
I can see from `git blame` that it's in the previous commit, bc00f058154bb4a630d24d64a55b5f181d235c63 [ Filter: Prefer xmalloc/xfree to malloc/free ].
This patch is irrelevant to the problem (although it was in the krt-export-filtr-fix branch). You can skip it.
Or else advise the best way to patch against the current release ?
You could use just a290da25a16b7c79d4a7a87f522b4068bca04979 or you can use whole master branch from git (where it is merged).
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
On Mon, Sep 05, 2016 at 03:21:40PM +0100, Justin Cattle wrote:
Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
Hi One question - is your 1.5.0 patched? Because your config contains option 'merge paths on', which is new in 1.6.0. Or does the problem appear even without this option? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hi Ondrej, Yes - it's a version from git with BGP multipath support: v1.5.0-19-g8d9eef1. Cheers, Just On 6 September 2016 at 17:05, Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Mon, Sep 05, 2016 at 03:21:40PM +0100, Justin Cattle wrote:
Hi,
A colleague of mine reported a memory usage issue with the bird daemon last year, which resulted in a request for a core dump, but we never followed it up. I'd like to re-open this discussion and see if anything can be done to fix it.
I'll provide some information regarding a production environment, where the problem is most obvious. But any further details and diagnostics will have to come from our lab environment. Please note, in production we mostly run 1.5, but in the lab we are on 1.6, however we see the same symptoms in both environments on both versions.
Hi
One question - is your 1.5.0 patched? Because your config contains option 'merge paths on', which is new in 1.6.0.
Or does the problem appear even without this option?
-- Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-- Notice: This email is confidential and may contain copyright material of members of the Ocado Group. Opinions and views expressed in this message may not necessarily reflect the opinions and views of the members of the Ocado Group. If you are not the intended recipient, please notify us immediately and delete all copies of this message. Please note that it is your responsibility to scan this message for viruses. Fetch and Sizzle are trading names of Speciality Stores Limited and Fabled is a trading name of Marie Claire Beauty Limited, both members of the Ocado Group. References to the “Ocado Group” are to Ocado Group plc (registered in England and Wales with number 7098618) and its subsidiary undertakings (as that expression is defined in the Companies Act 2006) from time to time. The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
participants (3)
-
Justin Cattle -
Ondrej Zajicek -
Pavel Tvrdík