Hi, all. I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load). I waited about 5-10 minutes for birdc started, when i wanted to reconfigure bird with configure soft command from birdc(ssh and other interactive programs works well). After i changed bird.conf in such a way that i get from upstream 0.0.0.0/0 instead of full view birdc worked fine and bird cpu load lowered(before, bird load cpu about 30-89%). What can i do in such situtation in future for birdc normal work? And additional two questions: Does bird immediately push routes info in kernel table, if it receive routes updates from bgp-peer or it wait for the next kernel table scan? How can i set in one kernel protocol different scan time from another one? Thanks.
On 12.03.2012 13:25, Oleg wrote:
Hi, all.
I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load). I waited about 5-10 minutes for birdc started, when i wanted to reconfigure bird with configure soft command from birdc(ssh and other interactive programs works well). After i changed bird.conf in such a way that i get from upstream 0.0.0.0/0 instead of full view birdc worked fine and bird cpu load lowered(before, bird load cpu about 30-89%). What can i do in such situtation in future for birdc normal work? Adding/removing kernel routes is a complex task, results in main process waiting for system call is completed.
It is very costy and one (developer) possibility is to add additional thread which does kernel interaction. We workaround this by installing IGP routes only (which does not help for border router) You can try to reduce MAX_RX_STEPS to 1 in sysdep/unix/io.c, this will help a bit for new route installation (2 minutes instead of 5), but at the moment answer seems to be "don't export all routes to kernel". You can filter all that is >=/20, for example, to provide reasonable load balancing with much less routes in OS kernel.
And additional two questions: Does bird immediately push routes info in kernel table, if it receive routes updates from bgp-peer or it wait for the next kernel table scan?
Kernel protocol (just as any other protocol) receives route updates immediately after they are announced to the routing table by BGP protocol instance. In the came callback kernel update (for the single route) is prepared and called.
How can i set in one kernel protocol different scan time from another one?
Thanks.
On Mon, Mar 12, 2012 at 02:27:23PM +0400, Alexander V. Chernikov wrote:
On 12.03.2012 13:25, Oleg wrote:
Hi, all.
I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load). I waited about 5-10 minutes for birdc started, when i wanted to reconfigure bird with configure soft command from birdc(ssh and other interactive programs works well). After i changed bird.conf in such a way that i get from upstream 0.0.0.0/0 instead of full view birdc worked fine and bird cpu load lowered(before, bird load cpu about 30-89%). What can i do in such situtation in future for birdc normal work? Adding/removing kernel routes is a complex task, results in main process waiting for system call is completed.
It is very costy and one (developer) possibility is to add additional thread which does kernel interaction.
I think that a separate thread can solve this nuisance.
We workaround this by installing IGP routes only (which does not help for border router)
Unfortunately, this is a border router.
You can try to reduce MAX_RX_STEPS to 1 in sysdep/unix/io.c, this will help a bit for new route installation (2 minutes instead of 5),
Thanks! I'll try it.
but at the moment answer seems to be "don't export all routes to kernel". You can filter all that is >=/20, for example, to provide reasonable load balancing with much less routes in OS kernel.
In simple configuration with one upstream this works fine, but what in more complex configurations? Can i lose some networks? Or can i get poor ping, because of /24 prefix, for example, is better than /20 that contain it?
And additional two questions: Does bird immediately push routes info in kernel table, if it receive routes updates from bgp-peer or it wait for the next kernel table scan? Kernel protocol (just as any other protocol) receives route updates immediately after they are announced to the routing table by BGP protocol instance. In the came callback kernel update (for the single route) is prepared and called.
So, in a such situation a simply increasing of a scan time value doesn't help as i understand.
Hello!
It is very costy and one (developer) possibility is to add additional thread which does kernel interaction.
I think that a separate thread can solve this nuisance.
... at the cost of introducing lots of extra complexity. I think that deferring kernel route updates (and maybe even propagation of routes to all protocols) to a separate event handler should help. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth There is no place like ~
On 12.03.2012 20:06, Martin Mares wrote:
Hello!
It is very costy and one (developer) possibility is to add additional thread which does kernel interaction.
I think that a separate thread can solve this nuisance.
... at the cost of introducing lots of extra complexity.
I think that deferring kernel route updates (and maybe even propagation of routes to all protocols) to a separate event handler should help.
This can help CLI interaction (and it is good as temporary solution), but does not decrease time needed for dispatching flapping sessions (particularly if there are many of them). (This is one of the things quagga suffers from). Syncing all routes make bird spend 2-3 times more in FreeBSD, the same order of magnitude applies for Linux, i guess. Offloading some activity with low core interaction to another thread can be a good start for offloading another parts of code (protocol messages parsing, sending hello messages, etc..). Moreover, this approach can improve the situation with kernel scan (which requires preparing and sending 100-150 Mb of data from kernel to userland and parsing it) so we stops all network processing for tens of hundreds of milliseconds which can be a significant issue for BFD (if, someday, it gets implemented)
Have a nice fortnight
Hello!
This can help CLI interaction (and it is good as temporary solution), but does not decrease time needed for dispatching flapping sessions (particularly if there are many of them).
If you introduce priorities in the event loop, you can make sure that flapping sessions cannot disrupt other sessions. I already thought about event priorities when I wrote the event loop, but I finally chose simplicity and decided that priorities can be added later if it turns out that they are needed.
Offloading some activity with low core interaction to another thread can be a good start for offloading another parts of code (protocol messages parsing, sending hello messages, etc..).
I think that once we decide that we need threading, it's better to do it once and well. I thought about it some years ago and it seems it won't be too complicated.
Moreover, this approach can improve the situation with kernel scan
Wouldn't it be better to get rid of kernel scans completely? :-) I think they should not be needed on current Linux systems, they are there only to make sure that no Netlink packets with updates got dropped. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth To avoid bugs in your room, just keep Windows closed.
On Tue, Mar 13, 2012 at 03:39:15PM +0100, Martin Mares wrote:
Hello!
Moreover, this approach can improve the situation with kernel scan
Wouldn't it be better to get rid of kernel scans completely? :-)
I think they should not be needed on current Linux systems, they are there only to make sure that no Netlink packets with updates got dropped.
If this situation can potentially occur, then choice of do scan or not to do scan must be made by user(for example, with "enable krt scan" command).
On Tue, Mar 13, 2012 at 03:39:15PM +0100, Martin Mares wrote:
Hello!
Moreover, this approach can improve the situation with kernel scan
Wouldn't it be better to get rid of kernel scans completely? :-)
I think they should not be needed on current Linux systems, they are there only to make sure that no Netlink packets with updates got dropped.
As i understand, now when i mistakenly delete a route from kernel table, bird restore it on the next table scan. How will this works without table scan?
Hello!
As i understand, now when i mistakenly delete a route from kernel table, bird restore it on the next table scan. How will this works without table scan?
The short answer is "Don't do that" :-) BIRD is not a tool for preventing you from shooting yourself in the foot. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth This sentence contradicts itself -- no actually it doesn't.
On Tue, Mar 13, 2012 at 08:44:43PM +0100, Martin Mares wrote:
As i understand, now when i mistakenly delete a route from kernel table, bird restore it on the next table scan. How will this works without table scan?
The short answer is "Don't do that" :-)
BIRD is not a tool for preventing you from shooting yourself in the foot.
But i like to think so :-). I think i will can restore it with restarting of kernel protocol in birdc. So may be this will not be so cruel to delete a route without table scan.
On 13.03.2012 18:39, Martin Mares wrote:
Hello!
This can help CLI interaction (and it is good as temporary solution), but does not decrease time needed for dispatching flapping sessions (particularly if there are many of them).
If you introduce priorities in the event loop, you can make sure that flapping sessions cannot disrupt other sessions. Well, I'm talking mostly not about flapping, bird seems to behave quite well. I mean, sometimes it matters if, say, reflector with several hundred BGP sessions can take them all up from random states within 1 (but not 2 or 3 or 4) minutes and continue to work "normally" if half of them is flapping. (And here kernel syncer steps in and says "hey, I need half of your CPU"). From this point of view any priorities (if I get an idea correctly) won't help.
I already thought about event priorities when I wrote the event loop, but I finally chose simplicity and decided that priorities can be added later if it turns out that they are needed.
Offloading some activity with low core interaction to another thread can be a good start for offloading another parts of code (protocol messages parsing, sending hello messages, etc..).
I think that once we decide that we need threading, it's better to do it once and well. I thought about it some years ago and it seems it won't be too complicated. Can you briefly describe it? I thought about this too, but, unfortunately, I can't see the way which does not imply significant core complication.
Moreover, this approach can improve the situation with kernel scan
Wouldn't it be better to get rid of kernel scans completely? :-)
Personally I like an idea of having additional independent self-healing/controlling process. Disabling it (which is already available) is one of possible solutions, but..
I think they should not be needed on current Linux systems, they are there only to make sure that no Netlink packets with updates got dropped.
Have a nice fortnight
-- WBR, Alexander
On 14.03.2012 00:54, Alexander V. Chernikov wrote:
On 13.03.2012 18:39, Martin Mares wrote:
Hello!
This can help CLI interaction (and it is good as temporary solution), but does not decrease time needed for dispatching flapping sessions (particularly if there are many of them).
If you introduce priorities in the event loop, you can make sure that flapping sessions cannot disrupt other sessions. Well, I'm talking mostly not about flapping, bird seems to behave quite well. I mean, sometimes it matters if, say, reflector with several hundred BGP sessions can take them all up from random states within 1 (but not 2 or 3 or 4) minutes and continue to work "normally" if half of them is flapping. (And here kernel syncer steps in and says "hey, I need half of your CPU"). From this point of view any priorities (if I get an idea correctly) won't help.
Just to illustrate order of magnitude: quagga, ~300 IPv4 peers, 1/6 is flapping: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 37813 root 1 117 0 146M 140M CPU13 13 125:11 96.48% zebra 38046 root 1 51 0 10327M 9698M select 12 41.7H 21.97% bgpd -- WBR, Alexander
On 12.03.2012 19:22, Oleg wrote:
On Mon, Mar 12, 2012 at 02:27:23PM +0400, Alexander V. Chernikov wrote:
On 12.03.2012 13:25, Oleg wrote:
Hi, all.
I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load). I waited about 5-10 minutes for birdc started, when i wanted to reconfigure bird with configure soft command from birdc(ssh and other interactive programs works well). After i changed bird.conf in such a way that i get from upstream 0.0.0.0/0 instead of full view birdc worked fine and bird cpu load lowered(before, bird load cpu about 30-89%). What can i do in such situtation in future for birdc normal work? Adding/removing kernel routes is a complex task, results in main process waiting for system call is completed.
It is very costy and one (developer) possibility is to add additional thread which does kernel interaction.
I think that a separate thread can solve this nuisance.
We workaround this by installing IGP routes only (which does not help for border router)
Unfortunately, this is a border router.
You can try to reduce MAX_RX_STEPS to 1 in sysdep/unix/io.c, this will help a bit for new route installation (2 minutes instead of 5),
Thanks! I'll try it.
but at the moment answer seems to be "don't export all routes to kernel". You can filter all that is>=/20, for example, to provide reasonable load balancing with much less routes in OS kernel.
In simple configuration with one upstream this works fine, but what in more complex configurations? Can i lose some networks? Or can i get poor ping, because of /24 prefix, for example, is better than /20 that contain it? You should leave default route from one (or both) your uplinks, so no connectivity should be lost. You still achieve some kind of load balancing for outgoing traffic, however it may not follow suboptimal path.
And additional two questions: Does bird immediately push routes info in kernel table, if it receive routes updates from bgp-peer or it wait for the next kernel table scan? Kernel protocol (just as any other protocol) receives route updates immediately after they are announced to the routing table by BGP protocol instance. In the came callback kernel update (for the single route) is prepared and called.
So, in a such situation a simply increasing of a scan time value doesn't help as i understand.
On Tue, Mar 13, 2012 at 11:47:58AM +0000, Alexander V. Chernikov wrote:
In simple configuration with one upstream this works fine, but what in more complex configurations? Can i lose some networks? Or can i get poor ping, because of /24 prefix, for example, is better than /20 that contain it? You should leave default route from one (or both) your uplinks, so no connectivity should be lost. You still achieve some kind of load balancing for outgoing traffic, however it may not follow suboptimal path.
Ou, i forgot about def gw :-).
On Mon, Mar 12, 2012 at 11:22:10PM +0400, Oleg wrote:
On Mon, Mar 12, 2012 at 02:27:23PM +0400, Alexander V. Chernikov wrote:
On 12.03.2012 13:25, Oleg wrote:
Hi, all.
I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load).
Hello Answering collectively for the whole thread: I did some preliminary testing and it on my test machine exporting full BGP feed (cca 400k routes) to a kernel table took 1-2 sec on Linux and 5-6 sec on BSD. Similar time for flushing the kernel table. Therefore, if we devote a half CPU for kernel sync, we have about 200 kr/s (kiloroutes per second) for Linux and 40 kr/s for BSD, this still seems more than enough for an edge router. Are there any estimates (using protocol statistics) for number of updates to kernel proto in this case? How many protocols, tables and ppie do you have in your case? The key to responsiveness (and ability to send keepalives on time) during heavy CPU load is in granularity. The main problem in BIRD is that whole route propagation is done synchronously - when route is received, it is propagated through all pipes and all routing tables to all final receivers in one step, which is problematic if you have several hundreds of BGP sessions (but probably not too problematic with regard to kernel sync). If this could be splitted and done asynchronously in some smart way, it could solve several problems. One idea is that routes (or nets) in a table would be connected in a list in the order as they arrived to the table, each protocol would have a position in that list for routes not yet exported to that protocol and event in event queue handling that export. Another advantage of that approach is that we could temporarily stop or rate limit propagation of routes to that protocol. But i noticed recently that there are some steps that have even bigger latency than just plain route propagation. For example disabling kernel protocol would cause flushing all routes in kernel table done in one step, which (as mentioned above) will block BIRD for cca 5 s on BSD with 400k routes. Fortunately, disabling kernel protocol is not a common operation. Another problem is protocol flushing - if protocol fell down, all its routes are de-propagated in one step (function rt_prune()). This is probably the most important cause of latencies and could be easily fixed. Another possible problem is a kernel scan, which is also done in one step, but at least in Linux it could probably be splitted to smaller steps and does not took too much time if the kernel table is in expected state. I would probably implement some latency measurement and do some more testing to get a better idea and probably fix the protocol flushing problem. BTW, one possible hack how to spare CPU time under heavy load with regard to kernel syncing is to disable synchronous kernel updates in krt_notify() and rely completely on syncing during periodic scan. BTW, regardless of all of this, for BFD we would definitely need a separate process/thread, but with almost none shared data. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Mon, Mar 26, 2012 at 01:25:32AM +0200, Ondrej Zajicek wrote:
Hello
Answering collectively for the whole thread:
I did some preliminary testing and it on my test machine exporting full BGP feed (cca 400k routes) to a kernel table took 1-2 sec on Linux and 5-6 sec on BSD. Similar time for flushing the kernel table. Therefore, if we devote a half CPU for kernel sync, we have about 200 kr/s (kiloroutes per second) for Linux and 40 kr/s for BSD, this still seems more than enough for an edge router. Are there any estimates (using protocol statistics) for number of updates to kernel proto in this case? How many protocols, tables and ppie do you have in your case?
I think, i don't understand correctly about estimates. From each of our upstreams we get full view(~400k routes). And if one upstream session is up/down, i think, kernel receive ~400k updates. I have totaly 5 tables, 5 kernel protocols, 8 bgp protocols, 2 pipes.
On Mon, Mar 26, 2012 at 06:02:17PM +0400, Oleg wrote:
On Mon, Mar 26, 2012 at 01:25:32AM +0200, Ondrej Zajicek wrote:
Hello
Answering collectively for the whole thread:
I did some preliminary testing and it on my test machine exporting full BGP feed (cca 400k routes) to a kernel table took 1-2 sec on Linux and 5-6 sec on BSD. Similar time for flushing the kernel table. Therefore, if we devote a half CPU for kernel sync, we have about 200 kr/s (kiloroutes per second) for Linux and 40 kr/s for BSD, this still seems more than enough for an edge router. Are there any estimates (using protocol statistics) for number of updates to kernel proto in this case? How many protocols, tables and ppie do you have in your case?
I think, i don't understand correctly about estimates. From each of our upstreams we get full view(~400k routes). And if one upstream session is up/down, i think, kernel receive ~400k updates. I have totaly 5 tables, 5 kernel protocols, 8 bgp protocols, 2 pipes.
BTW, you could try the current code from GIT, it should fix the problem of blocking BIRD for too long when protocol goes down. (commit fb829de69052755a31d76d73e17525d050e5ff4d) -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 26.03.2012 03:25, Ondrej Zajicek wrote:
On Mon, Mar 12, 2012 at 11:22:10PM +0400, Oleg wrote:
On Mon, Mar 12, 2012 at 02:27:23PM +0400, Alexander V. Chernikov wrote:
On 12.03.2012 13:25, Oleg wrote:
Hi, all.
I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load).
Hello
Answering collectively for the whole thread:
I did some preliminary testing and it on my test machine exporting full BGP feed (cca 400k routes) to a kernel table took 1-2 sec on Linux and 5-6 sec on BSD. Similar time for flushing the kernel table. Therefore, if we devote a half CPU for kernel sync, we have about 200 kr/s (kiloroutes per second) for Linux and 40 kr/s for BSD, this still seems more than enough for an edge router. Are there any estimates (using protocol statistics) for number of updates to kernel proto in this case? How many protocols, tables and ppie do you have in your case?
The key to responsiveness (and ability to send keepalives on time) during heavy CPU load is in granularity. The main problem in BIRD is that whole route propagation is done synchronously - when route is received, it is propagated through all pipes and all routing tables to all final receivers in one step, which is problematic if you have several hundreds of BGP sessions (but probably not too problematic with I've be been playing with BGP/core code in preparations for peer-groups implementation.
Setup: 1 peer with full-view (1), 1 peer as full-view receiver (10. both are disabled by default. We're starting bird, enables peer 1. After full-view is received we enables second peer. Some bgp bucket statistics: max_feed: 256 iterations: 1551 buckets: 362184 routes: 397056 effect: 8% max_feed: 512 iterations: 775 buckets: 351902 routes: 396800 effect: 11% max_feed: 1024 iterations: 387 buckets: 335773 routes: 396288 effect: 15% max_feed: 2048 iterations: 193 buckets: 300434 routes: 395264 effect: 23% max_feed: 4096 iterations: 96 buckets: 255752 routes: 393216 effect: 34% max_feed: 8192 iterations: 48 buckets: 216780 routes: 393216 effect: 44% 'Effect' means (routes - buckets) * 100 / routes e.g. how much prefixes are stored in existing buckets. Maybe we can consider making max_feed value to be auto-tuned ? e.g. to be 8 or 16k for small total amount of protocols. If we assume max_feed * proto_count to be const (which keeps granularity at the same level), and say that we use default feed (256) for 256 protocols, we can automatically recalculate max_feed on every { configure, protocol enabled/disabled state change / whatever } ( and something should be done with kernel protocol before doing this) ( I've also changed bucket allocation style to slabs but it seems there is no noticeable by getrusage() change (at least on max_feed=256). however, maybe we should do this at least for consistency. I'll come with the results later after testing in more complex setup )
regard to kernel sync). If this could be splitted and done asynchronously in some smart way, it could solve several problems. One idea is that routes (or nets) in a table would be connected in a list in the order as they arrived to the table, each protocol would have a position in that list for routes not yet exported to that protocol and event in event queue handling that export. Another advantage of that approach is that we could temporarily stop or rate limit propagation of routes to that protocol.
But i noticed recently that there are some steps that have even bigger latency than just plain route propagation. For example disabling kernel protocol would cause flushing all routes in kernel table done in one step, which (as mentioned above) will block BIRD for cca 5 s on BSD with 400k routes. Fortunately, disabling kernel protocol is not a common operation.
Another problem is protocol flushing - if protocol fell down, all its routes are de-propagated in one step (function rt_prune()). This is probably the most important cause of latencies and could be easily fixed.
Another possible problem is a kernel scan, which is also done in one step, but at least in Linux it could probably be splitted to smaller steps and does not took too much time if the kernel table is in expected state. ... CLI interface can easily be another abuser: bird> show route count 2723321 of 2723321 routes for 407158 networks
If I do 'show route' for such table this can block bird for 10? seconds.
I would probably implement some latency measurement and do some more testing to get a better idea and probably fix the protocol flushing problem.
BTW, one possible hack how to spare CPU time under heavy load with regard to kernel syncing is to disable synchronous kernel updates in krt_notify() and rely completely on syncing during periodic scan.
BTW, regardless of all of this, for BFD we would definitely need a separate process/thread, but with almost none shared data.
On Tue, Mar 27, 2012 at 07:13:11PM +0400, Alexander V. Chernikov wrote:
On 26.03.2012 03:25, Ondrej Zajicek wrote:
On Mon, Mar 12, 2012 at 11:22:10PM +0400, Oleg wrote:
On Mon, Mar 12, 2012 at 02:27:23PM +0400, Alexander V. Chernikov wrote:
On 12.03.2012 13:25, Oleg wrote:
Hi, all.
I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load).
Hello
Answering collectively for the whole thread:
I did some preliminary testing and it on my test machine exporting full BGP feed (cca 400k routes) to a kernel table took 1-2 sec on Linux and 5-6 sec on BSD. Similar time for flushing the kernel table. Therefore, if we devote a half CPU for kernel sync, we have about 200 kr/s (kiloroutes per second) for Linux and 40 kr/s for BSD, this still seems more than enough for an edge router. Are there any estimates (using protocol statistics) for number of updates to kernel proto in this case? How many protocols, tables and ppie do you have in your case?
The key to responsiveness (and ability to send keepalives on time) during heavy CPU load is in granularity. The main problem in BIRD is that whole route propagation is done synchronously - when route is received, it is propagated through all pipes and all routing tables to all final receivers in one step, which is problematic if you have several hundreds of BGP sessions (but probably not too problematic with I've be been playing with BGP/core code in preparations for peer-groups implementation.
Setup: 1 peer with full-view (1), 1 peer as full-view receiver (10. both are disabled by default. We're starting bird, enables peer 1. After full-view is received we enables second peer.
Some bgp bucket statistics: max_feed: 256 iterations: 1551 buckets: 362184 routes: 397056 effect: 8% max_feed: 512 iterations: 775 buckets: 351902 routes: 396800 effect: 11% max_feed: 1024 iterations: 387 buckets: 335773 routes: 396288 effect: 15% max_feed: 2048 iterations: 193 buckets: 300434 routes: 395264 effect: 23% max_feed: 4096 iterations: 96 buckets: 255752 routes: 393216 effect: 34% max_feed: 8192 iterations: 48 buckets: 216780 routes: 393216 effect: 44%
'Effect' means (routes - buckets) * 100 / routes e.g. how much prefixes are stored in existing buckets.
Maybe we can consider making max_feed value to be auto-tuned ? e.g. to be 8 or 16k for small total amount of protocols. If we assume max_feed * proto_count to be const (which keeps granularity at the same level), and say that we use default feed (256) for 256 protocols, we can automatically recalculate max_feed on every { configure, protocol enabled/disabled state change / whatever }
Is there any point to trying to achieve efficient route packing to buckets? Most rx processing is done per route, so buckets just save some TCP data transfers. BTW, these results depends on many things like how big buffers kernel has for TCP and how fast the other side is able to acknowledge receiving data. I guess that first, BIRD probably flushes buckets from BGP to TCP as fast as they are generated with minimal packing (depending mostly on granularity or max_feed), later TCP buffers became full, sending updates is postponed and BGP bucket cache started to fill (you could see that in 'show memory'). If you want to get efficient packing, probably most elegant solution would be to add some delay (like 2 s) before activating/scheduling sending BGP update packets. Or some smart approach, like if BGP bucket cache contains at least x buckets, schedule updates immediately, otherwise schedule them after 2 s.
Another possible problem is a kernel scan, which is also done in one step, but at least in Linux it could probably be splitted to smaller steps and does not took too much time if the kernel table is in expected state. ... CLI interface can easily be another abuser: bird> show route count 2723321 of 2723321 routes for 407158 networks
If I do 'show route' for such table this can block bird for 10? seconds.
Really? show route processing is splitted per 64 routes, so i suppose that only the CLI session is blocked. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 27.03.2012 20:44, Ondrej Zajicek wrote:
On Tue, Mar 27, 2012 at 07:13:11PM +0400, Alexander V. Chernikov wrote:
On 26.03.2012 03:25, Ondrej Zajicek wrote:
On Mon, Mar 12, 2012 at 11:22:10PM +0400, Oleg wrote:
On Mon, Mar 12, 2012 at 02:27:23PM +0400, Alexander V. Chernikov wrote:
On 12.03.2012 13:25, Oleg wrote:
Hi, all.
I have some experience with bird under heavy cpu load. I had a situation when bird do frequent updates of kernel table, because of bgp-session frequent down/up (because of cpu and/or net load).
Hello
Answering collectively for the whole thread:
I did some preliminary testing and it on my test machine exporting full BGP feed (cca 400k routes) to a kernel table took 1-2 sec on Linux and 5-6 sec on BSD. Similar time for flushing the kernel table. Therefore, if we devote a half CPU for kernel sync, we have about 200 kr/s (kiloroutes per second) for Linux and 40 kr/s for BSD, this still seems more than enough for an edge router. Are there any estimates (using protocol statistics) for number of updates to kernel proto in this case? How many protocols, tables and ppie do you have in your case?
The key to responsiveness (and ability to send keepalives on time) during heavy CPU load is in granularity. The main problem in BIRD is that whole route propagation is done synchronously - when route is received, it is propagated through all pipes and all routing tables to all final receivers in one step, which is problematic if you have several hundreds of BGP sessions (but probably not too problematic with I've be been playing with BGP/core code in preparations for peer-groups implementation.
Setup: 1 peer with full-view (1), 1 peer as full-view receiver (10. both are disabled by default. We're starting bird, enables peer 1. After full-view is received we enables second peer.
Some bgp bucket statistics: max_feed: 256 iterations: 1551 buckets: 362184 routes: 397056 effect: 8% max_feed: 512 iterations: 775 buckets: 351902 routes: 396800 effect: 11% max_feed: 1024 iterations: 387 buckets: 335773 routes: 396288 effect: 15% max_feed: 2048 iterations: 193 buckets: 300434 routes: 395264 effect: 23% max_feed: 4096 iterations: 96 buckets: 255752 routes: 393216 effect: 34% max_feed: 8192 iterations: 48 buckets: 216780 routes: 393216 effect: 44%
'Effect' means (routes - buckets) * 100 / routes e.g. how much prefixes are stored in existing buckets.
Maybe we can consider making max_feed value to be auto-tuned ? e.g. to be 8 or 16k for small total amount of protocols. If we assume max_feed * proto_count to be const (which keeps granularity at the same level), and say that we use default feed (256) for 256 protocols, we can automatically recalculate max_feed on every { configure, protocol enabled/disabled state change / whatever }
Is there any point to trying to achieve efficient route packing to buckets? Most rx processing is done per route, so buckets just save some TCP data transfers. Decrease number of packets being sent to large amount of readers (actually, peer-group members). Yes, it is not the piece where the main CPU-intensive parts are done, but this can still save, say, 1-5% without any cost from our side. Why not be polite if we can?
Our typical firewall, see number of messages received: Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd XXX-BIRD 4 XXXXX 11394287 93753 0 0 0 03:53:52 403577 X-QUAGGA 4 XXXXX 11185307 92512 0 0 0 01w5d18h 403583 X-QUAGGA 4 XXXXX 26569910 93805 0 0 0 05w6d06h 403411
BTW, these results depends on many things like how big buffers kernel has for TCP and how fast the other side is able to acknowledge receiving data. I guess that first, BIRD probably flushes buckets from BGP to TCP as fast as they are generated with minimal packing (depending mostly on granularity or max_feed), later TCP buffers became full, sending updates is postponed and BGP bucket cache started to fill (you could see that in 'show memory').
If you want to get efficient packing, probably most elegant solution would be to add some delay (like 2 s) before activating/scheduling sending BGP update packets. Or some smart approach, like if BGP bucket cache contains at least x buckets, schedule updates immediately, otherwise schedule them after 2 s.
Thanks for the idea. I should have thought about that approach.
Another possible problem is a kernel scan, which is also done in one step, but at least in Linux it could probably be splitted to smaller steps and does not took too much time if the kernel table is in expected state. ... CLI interface can easily be another abuser: bird> show route count 2723321 of 2723321 routes for 407158 networks
If I do 'show route' for such table this can block bird for 10? seconds.
Really? show route processing is splitted per 64 routes, so i suppose that only the CLI session is blocked.
Not exactly. 'show route' result shows up immediately, but when I press 'q' to quit from the results I see 'bird' process eating CPU for ~5 seconds.
On Tue, Mar 27, 2012 at 09:29:10PM +0400, Alexander V. Chernikov wrote:
CLI interface can easily be another abuser: bird> show route count 2723321 of 2723321 routes for 407158 networks
If I do 'show route' for such table this can block bird for 10? seconds.
Really? show route processing is splitted per 64 routes, so i suppose that only the CLI session is blocked. Not exactly.
'show route' result shows up immediately, but when I press 'q' to quit from the results I see 'bird' process eating CPU for ~5 seconds.
Yes, but even in that case it should be responsible ('q' is handled in client, bird generates all the routes and send them to birdc which just throws them away, but just 64 routes are processed in one block. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (5)
-
Alexander V. Chernikov -
Alexander V. Chernikov -
Martin Mares -
Oleg -
Ondrej Zajicek