Dear collegues, I failed to find anything regarding the subject. Could somebody give some prompts on it? Particularily, is a config like CPU: intel ATOM D2550 Dual Core 1.86G 4 thread DDR3 4Gb (or 8Gb) suitable for a 3 GBIT interface BGP router with 2 full view uplinks? I am intending to use something like this: http://www.aliexpress.com/item/2013-New-fanless-network-1U-server-with-intel... Thanks to the dev team, and kind regards. Tigran Zakoyan.
On 23-12-2013 12:10, Tigran Zakoyan wrote:
I failed to find anything regarding the subject. Could somebody give some prompts on it?
Particularily, is a config like
CPU: intel ATOM D2550 Dual Core 1.86G 4 thread DDR3 4Gb (or 8Gb) suitable for a 3 GBIT interface BGP router with 2 full view uplinks?
I am intending to use something like this: http://www.aliexpress.com/item/2013-New-fanless-network-1U-server-with-intel...
Bird will run just fine on such hardware. But that's not quite likely what you're asking for. You're probably asking if such hardware can handle the routing of (multi)gigabit traffic, which is a completely different question (bird doesn't route). If all 3 interfaces are filled with 64byte packets, it's likely not gonna handle it. As to what the max throughput of such a system would, I wouldn't even dare to make a guess. As there's just too much variables involved. Regards, Ruben
В письме от 23 декабря 2013 15:10:19 пользователь Tigran Zakoyan написал:
Dear collegues,
It seems, your question is more about software routing, not BIRD as routing daemon as standalone solution only for handling route information (for example on IXP RSes).
I failed to find anything regarding the subject. Could somebody give some prompts on it?
BIRD itself as routing daemon has relatively small footprint on memory and CPU, based on my experience BIRD with 2 FV consumes < 250MB of RAM.
Particularily, is a config like
CPU: intel ATOM D2550 Dual Core 1.86G 4 thread DDR3 4Gb (or 8Gb) suitable for a 3 GBIT interface BGP router with 2 full view uplinks?
I am intending to use something like this: http://www.aliexpress.com/item/2013-New-fanless-network-1U-server-with-intel -PCI-E-1000M-6-82583v-D2550-6-Gigabit/1253878720.html
Using software routing depends mostonly on other characteristics: * What policy will be applied to traffic (ACL, QoS, BPR, etc.)? * How much traffic should be routed with software routers (in Packet Per Second (PPS) and Bits Per Second (BPS))? * Which OS would be used to perform routing?
Hi Sergey, On 23-12-2013 13:53, Sergey Popovich wrote:
Also having threads (Hyper-Threading) has no good effect on traffic forwarding workloads, as CPU cache is shared between physical cores, causing more cache misses and thus lower performance.
This one I should keep in mind myself, hadn't really thought of that one. While we're at this topic, do you happen to have some more (effective) changes/configurations to improve your max throughput? For instance, you mentioned the disabling of connection tracking. Which method do you use for that? Blacklist the modules or smart use of the NOTRACK netfilter target? I'm currently about to replace some of my software routers, and have been looking around for optimizations like these. Regards, Ruben
В письме от 23 декабря 2013 14:08:26 пользователь Ruben Laban написал:
Hi Sergey,
On 23-12-2013 13:53, Sergey Popovich wrote:
Also having threads (Hyper-Threading) has no good effect on traffic forwarding workloads, as CPU cache is shared between physical cores, causing more cache misses and thus lower performance.
This one I should keep in mind myself, hadn't really thought of that one. While we're at this topic, do you happen to have some more (effective) changes/configurations to improve your max throughput?
For instance, you mentioned the disabling of connection tracking. Which method do you use for that? Blacklist the modules or smart use of the NOTRACK netfilter target?
We do not use NAT/PAT and statefull firewall on forwarded packets (for input we have one, to controll access to management), so using conntrack is useless and moreover harmful, as it stores/update information in internal hashtables state of flows and examines each packet arriwing on machine. Routing information (routes) in modern kernels stored using trie structure (same used in BIRD), which is faster than hash tables. Having something something like iptables -A PREROUTING -t raw -m addrtype ! --dst-type LOCAL -j CT --notrack to disable tracking of forwarded packets is good enough. NOTRACK target is obsolete for now (however still might be supported), its functionality superceeded by CT target. Tuning conntrack (nf_conntrack_*) variables is useless, as we goning to track only traffic destined for machine itself (INPUT chain in iptables). For example in my common deployments as Access Server where is only 3 rules in FORWARD chain (2 to stablish two way filtering on customer request using ipset, hash:net,port,net or hash:ip,port,net and 1 - is a billing hook). Filter table and 2 in PREROUTING raw table (1 - rpfilter match, uRPF, 2 - notrack). INPUT chain - is left to you how to filter traffic going to your router itself, as it is not performance critical. 1. You may get more benefits from tuning Linux routing cache (which is removed from starting from 3.6 tree), there should be information about routing cache on the web. In general: routing cache is bad, avid it by using kernels starting from 3.8 (before might have performance regressions due to cache removals), as it has poor performance under DoS. Here is example on how I tuned out routing cache for 3.2 series on system with 2GB ram: # RT cache net/ipv4/route/gc_thresh = 786432 net/ipv4/route/max_size = 1572864 net/ipv4/route/gc_min_interval = 0 net/ipv4/route/gc_min_interval_ms = 500 net/ipv4/route/gc_timeout = 600 net/ipv4/route/gc_interval = 300 net/ipv4/route/gc_elasticity = 4 #net/ipv4/route/mtu_expires = 600 #net/ipv4/route/min_pmtu = 552 #net/ipv4/route/min_adv_mss = 256 Please note in IPv6 routing cache does not exists and some parameters (if exists) has different meaning! gc_* variables controlls garbage collector parameters. max_size = maximum number of entries in hash table. And one much more interesting parameter: rt_cache_rebuild_count - by default it is equals to 4, however using value -1 turns off routing cache. Turning off routing cache might have positive effect on performance (however this depends on traffic patterns in your network). Details on these and other parameters could be found in Documentation/networking/ip-sysctl.txt in Linux kernel tree. See lnstat(8) to obtain various network statistics (including routing cache information under rt_cache file). Tuning other sysctl variables (e.g. not related to routing cache, or for L3 stack - ARP, Proxy ARP, turn/on off forwarding on per interface basic etc) seems useless (especially tcp_* variables, related to TCP implementation :-)). 2. Other important point: do not use too much iptables rules in FORWARD and PREROUTING - iptables in Linux kernel implemented as list of chains, called sequentially. There is no cache in firewall implementation, no short cuts, just linear lookup of each entry in chain. Use ipset(8) to store big number of IPs and match with single iptables rule, which stores data in bitmaps and hashes instead for performance. Implementing QoS (shaping, policing, etc.) requires packet classification. Typically classification is done by using src/dst from packet. Use hashed mode with cls_u32 or cls_flow classifiers, if possible with cls_flow use classification based on realms (which is set during packet forwarding in IP stack, and thus requires not additional lookup in skb in cls_flow). Use pfifo as qdisc on leaf classes. In general: using shaping instead of policing by ISP is bad idea, as it requires a lot more resources than simply policing traffic and thus increases load on your routers (requires to enqueue/dequeue packet on router), Also shaping increases latency (not much for real tasks as for me). However much people likes shaping, as it makes less packet loss for end users due to its queuing nature. Also scheduling module HTB which is commonly deployed - proven and mature, well maintained in kernel. Relatively to HTB: use sch_htb from latest kernels, at least with upstream commit 64153ce0a7b6 (net_sched: htb: do not setup default rate estimators). This commits disables setup of rate estimators by default (no real time statistic about packet rate in class), but greatly reduces number of locks in queuing code, and prevents cache starvations. Really good for performance. Optionally rate estimators could be enabled/disabled at runtime (htb_rate_est). Also never sch_htb implementations has more precise traffic accounging (especially TSO mode accounting is fixed). So in QoS principials most important part: classification. 3. Interrupts balancing Using NIC with hardware multiqueuing support and proper interrupt load balancing accross all cores in system is essential. Be avare, when using 802.1ad (QinQ) your NIC seems looses ability to distribute received packets to multiple queues, among with other hardware offloading capabilies (IP packet checksum offloading, etc). However in my practice, on servers with 8 NIC (4 - Up, 4 - Down, aggregated with LACP by bonding functionality in kernel) on 4 Core CPU has less impact, as RX-0 of each NIC bound to separate core. Interrupts could be balanced manually (by custom written script), or by using irqbalance. Of course many places recommends to use balance IRQs manually, however I have no point why this should be done and why this is best practice? On my own experience irqbalance of the latest versions do its job wery well. However in your kernel support for msi to irq assignment mapping in sysfs is a MUST, overwise you get your interrupts balanced with much less success (see output of irqbalance -d on console, it warns about missed support from kernel sysfs). 4. Use 1G NIC in bonding instead of 10G where appropriate. Yes, bonding as any other part in software router is implemented in software. But it seems relatively lightweight on term of CPU. To utilize 10G with shaping and firewall you need much powerful CPU with more cores. Even hawing such one would not help, as locking in QoS schedulers may become bottleckeck. There is work in progress in kernel to introduce HTB on multiq scheduler. By using layer2+3 policy is sufficient (and better than layer3+4, which is not fully compliant 802.1ad in terms of load distribution) and provides good load sharing across slaves. 5. Use NAT/PAT in separate network namespace. NAT/PAT requires conntrack, enabling conntrack for forwarded traffic is not good in performance terms. Linux has support for network namespaces for years. This allows you to establish new copy of network stack, attach network interfaces to it, configure IP addresses (even overlapping) and iptables/ipset (only on very recent kernels) rules, separately from core system. See ip-netns(8) for more information on how to use this. 6. FV vs default + local networks. This is more ram question as modern Linux kernels use trie to store routing table entries. I see wery little performance impact FV vs local networks (less than 1% of CPU). FV consumes nearly 60MB in kernel trie structures. Using default + local networks seems more than enought to perform routing and policy on traffic. 7. Tuning igb(7) driver parameters. Original Intel igb driver has number of options, which is not present in the mainline kernel driver. However changing same of them could increase packet latencies, which is not good enought for modern realtime application, so I would not recommended to touch these parameters unless you have extra reason. They have resonable default values. ---- There are lots of things to consider to, but these are essential.
I'm currently about to replace some of my software routers, and have been looking around for optimizations like these.
Regards, Ruben
-- SP5474-RIPE Sergey Popovich
Wow, thanks a lot for the extensive response, Sergey! This gives me plenty of stuff to research further. I like the iptables addrtype "trick". I was focusing on how to maintain a list of local IPs to exclude from the conntrack exclude. But this is way cleaner, great! This might even be a good start for an optimizations page on the Bird wiki? Regards, Ruben On 23-12-2013 16:03, Sergey Popovich wrote:
В письме от 23 декабря 2013 14:08:26 пользователь Ruben Laban написал:
Hi Sergey,
On 23-12-2013 13:53, Sergey Popovich wrote:
Also having threads (Hyper-Threading) has no good effect on traffic forwarding workloads, as CPU cache is shared between physical cores, causing more cache misses and thus lower performance.
This one I should keep in mind myself, hadn't really thought of that one. While we're at this topic, do you happen to have some more (effective) changes/configurations to improve your max throughput?
For instance, you mentioned the disabling of connection tracking. Which method do you use for that? Blacklist the modules or smart use of the NOTRACK netfilter target?
We do not use NAT/PAT and statefull firewall on forwarded packets (for input we have one, to controll access to management), so using conntrack is useless and moreover harmful, as it stores/update information in internal hashtables state of flows and examines each packet arriwing on machine.
Routing information (routes) in modern kernels stored using trie structure (same used in BIRD), which is faster than hash tables.
Having something something like
iptables -A PREROUTING -t raw -m addrtype ! --dst-type LOCAL -j CT --notrack
to disable tracking of forwarded packets is good enough.
NOTRACK target is obsolete for now (however still might be supported), its functionality superceeded by CT target.
Tuning conntrack (nf_conntrack_*) variables is useless, as we goning to track only traffic destined for machine itself (INPUT chain in iptables).
For example in my common deployments as Access Server where is only 3 rules in FORWARD chain (2 to stablish two way filtering on customer request using ipset, hash:net,port,net or hash:ip,port,net and 1 - is a billing hook). Filter table and 2 in PREROUTING raw table (1 - rpfilter match, uRPF, 2 - notrack). INPUT chain - is left to you how to filter traffic going to your router itself, as it is not performance critical.
1. You may get more benefits from tuning Linux routing cache (which is removed from starting from 3.6 tree), there should be information about routing cache on the web. In general: routing cache is bad, avid it by using kernels starting from 3.8 (before might have performance regressions due to cache removals), as it has poor performance under DoS.
Here is example on how I tuned out routing cache for 3.2 series on system with 2GB ram:
# RT cache net/ipv4/route/gc_thresh = 786432 net/ipv4/route/max_size = 1572864 net/ipv4/route/gc_min_interval = 0 net/ipv4/route/gc_min_interval_ms = 500 net/ipv4/route/gc_timeout = 600 net/ipv4/route/gc_interval = 300 net/ipv4/route/gc_elasticity = 4 #net/ipv4/route/mtu_expires = 600 #net/ipv4/route/min_pmtu = 552 #net/ipv4/route/min_adv_mss = 256
Please note in IPv6 routing cache does not exists and some parameters (if exists) has different meaning!
gc_* variables controlls garbage collector parameters. max_size = maximum number of entries in hash table.
And one much more interesting parameter:
rt_cache_rebuild_count - by default it is equals to 4, however using value -1 turns off routing cache. Turning off routing cache might have positive effect on performance (however this depends on traffic patterns in your network).
Details on these and other parameters could be found in Documentation/networking/ip-sysctl.txt in Linux kernel tree.
See lnstat(8) to obtain various network statistics (including routing cache information under rt_cache file).
Tuning other sysctl variables (e.g. not related to routing cache, or for L3 stack - ARP, Proxy ARP, turn/on off forwarding on per interface basic etc) seems useless (especially tcp_* variables, related to TCP implementation :-)).
2. Other important point: do not use too much iptables rules in FORWARD and PREROUTING - iptables in Linux kernel implemented as list of chains, called sequentially. There is no cache in firewall implementation, no short cuts, just linear lookup of each entry in chain.
Use ipset(8) to store big number of IPs and match with single iptables rule, which stores data in bitmaps and hashes instead for performance.
Implementing QoS (shaping, policing, etc.) requires packet classification. Typically classification is done by using src/dst from packet. Use hashed mode with cls_u32 or cls_flow classifiers, if possible with cls_flow use classification based on realms (which is set during packet forwarding in IP stack, and thus requires not additional lookup in skb in cls_flow).
Use pfifo as qdisc on leaf classes.
In general: using shaping instead of policing by ISP is bad idea, as it requires a lot more resources than simply policing traffic and thus increases load on your routers (requires to enqueue/dequeue packet on router), Also shaping increases latency (not much for real tasks as for me).
However much people likes shaping, as it makes less packet loss for end users due to its queuing nature. Also scheduling module HTB which is commonly deployed - proven and mature, well maintained in kernel.
Relatively to HTB: use sch_htb from latest kernels, at least with upstream commit 64153ce0a7b6 (net_sched: htb: do not setup default rate estimators). This commits disables setup of rate estimators by default (no real time statistic about packet rate in class), but greatly reduces number of locks in queuing code, and prevents cache starvations. Really good for performance. Optionally rate estimators could be enabled/disabled at runtime (htb_rate_est). Also never sch_htb implementations has more precise traffic accounging (especially TSO mode accounting is fixed).
So in QoS principials most important part: classification.
3. Interrupts balancing
Using NIC with hardware multiqueuing support and proper interrupt load balancing accross all cores in system is essential.
Be avare, when using 802.1ad (QinQ) your NIC seems looses ability to distribute received packets to multiple queues, among with other hardware offloading capabilies (IP packet checksum offloading, etc). However in my practice, on servers with 8 NIC (4 - Up, 4 - Down, aggregated with LACP by bonding functionality in kernel) on 4 Core CPU has less impact, as RX-0 of each NIC bound to separate core.
Interrupts could be balanced manually (by custom written script), or by using irqbalance. Of course many places recommends to use balance IRQs manually, however I have no point why this should be done and why this is best practice?
On my own experience irqbalance of the latest versions do its job wery well. However in your kernel support for msi to irq assignment mapping in sysfs is a MUST, overwise you get your interrupts balanced with much less success (see output of irqbalance -d on console, it warns about missed support from kernel sysfs).
4. Use 1G NIC in bonding instead of 10G where appropriate.
Yes, bonding as any other part in software router is implemented in software. But it seems relatively lightweight on term of CPU.
To utilize 10G with shaping and firewall you need much powerful CPU with more cores. Even hawing such one would not help, as locking in QoS schedulers may become bottleckeck. There is work in progress in kernel to introduce HTB on multiq scheduler.
By using layer2+3 policy is sufficient (and better than layer3+4, which is not fully compliant 802.1ad in terms of load distribution) and provides good load sharing across slaves.
5. Use NAT/PAT in separate network namespace.
NAT/PAT requires conntrack, enabling conntrack for forwarded traffic is not good in performance terms.
Linux has support for network namespaces for years. This allows you to establish new copy of network stack, attach network interfaces to it, configure IP addresses (even overlapping) and iptables/ipset (only on very recent kernels) rules, separately from core system.
See ip-netns(8) for more information on how to use this.
6. FV vs default + local networks.
This is more ram question as modern Linux kernels use trie to store routing table entries. I see wery little performance impact FV vs local networks (less than 1% of CPU). FV consumes nearly 60MB in kernel trie structures.
Using default + local networks seems more than enought to perform routing and policy on traffic.
7. Tuning igb(7) driver parameters.
Original Intel igb driver has number of options, which is not present in the mainline kernel driver.
However changing same of them could increase packet latencies, which is not good enought for modern realtime application, so I would not recommended to touch these parameters unless you have extra reason.
They have resonable default values.
----
There are lots of things to consider to, but these are essential.
I'm currently about to replace some of my software routers, and have been looking around for optimizations like these.
Regards, Ruben
В письме от 23 декабря 2013 16:26:58 пользователь Ruben Laban написал:
Wow, thanks a lot for the extensive response, Sergey! This gives me plenty of stuff to research further.
I like the iptables addrtype "trick". I was focusing on how to maintain a list of local IPs to exclude from the conntrack exclude. But this is way cleaner, great!
Be carefully with new kernels, they contain lots of new features, much functionality and performance enhancements, however as with any products they sometimes introduce regressions, bugs. For software router I would suggest to use stable longterm kernel branches such as 3.2, 3.4 and new 3.10 with a lot of performance improvements, enhancements, but as for me it not yet ready for software router (few patch levels solve this? :-)). 3.2, 3.4 mostly same in their functionalities, 3.2 is very stable.
This might even be a good start for an optimizations page on the Bird wiki?
I do not think so, this is more related to software packet forwarding things not for BIRD as routing suite itself. Probably better places exist for such topic.
Regards, Ruben
----
There are lots of things to consider to, but these are essential.
I'm currently about to replace some of my software routers, and have been looking around for optimizations like these.
Regards, Ruben
-- SP5474-RIPE Sergey Popovich
On 23-12-2013 16:30, Sergey Popovich wrote:
В письме от 23 декабря 2013 16:26:58 пользователь Ruben Laban написал:
Wow, thanks a lot for the extensive response, Sergey! This gives me plenty of stuff to research further.
I like the iptables addrtype "trick". I was focusing on how to maintain a list of local IPs to exclude from the conntrack exclude. But this is way cleaner, great!
Be carefully with new kernels, they contain lots of new features, much functionality and performance enhancements, however as with any products they sometimes introduce regressions, bugs.
Unfortunately that's the case way more often than one would hope for :-(
For software router I would suggest to use stable longterm kernel branches such as 3.2, 3.4 and new 3.10 with a lot of performance improvements, enhancements, but as for me it not yet ready for software router (few patch levels solve this? :-)).
What are the dealbreakers currently preventing you from using and/or recommending 3.10.x? I don't really follow the developments in kernel land, even though I should (at least more than I do now).
3.2, 3.4 mostly same in their functionalities, 3.2 is very stable.
This might even be a good start for an optimizations page on the Bird wiki?
I do not think so, this is more related to software packet forwarding things not for BIRD as routing suite itself. Probably better places exist for such topic.
Fair point. I fully agree that it is not directly related to BIRD, but software packet forwarding is thing that's rather close to home for BIRD. Either way, you've given me plenty of food for thought already to get me through the holidays ;-) Regards, Ruben
В письме от 23 декабря 2013 16:53:55 пользователь Ruben Laban написал:
On 23-12-2013 16:30, Sergey Popovich wrote:
В письме от 23 декабря 2013 16:26:58 пользователь Ruben Laban написал:
Wow, thanks a lot for the extensive response, Sergey! This gives me plenty of stuff to research further.
I like the iptables addrtype "trick". I was focusing on how to maintain a list of local IPs to exclude from the conntrack exclude. But this is way cleaner, great!
Be carefully with new kernels, they contain lots of new features, much functionality and performance enhancements, however as with any products they sometimes introduce regressions, bugs.
Unfortunately that's the case way more often than one would hope for :-(
For software router I would suggest to use stable longterm kernel branches such as 3.2, 3.4 and new 3.10 with a lot of performance improvements, enhancements, but as for me it not yet ready for software router (few patch levels solve this? :-)).
What are the dealbreakers currently preventing you from using and/or recommending 3.10.x? I don't really follow the developments in kernel land, even though I should (at least more than I do now).
I would recommend to use it in test environment, not in production, at least until 3.13 released, in upstream still many commits not picked up for stable. 3.10 is should be considered as replacement for 3.2, 3.4 on software routers in future. Also note, that for stable longterm branches only fixes backported, no functional improvements/changes. Thus not all network stack enhancements (and yes, even fixes!!!) backported as this takes lot of time to maintainer of longterm branch to select and backport commits. Also I would not recommend to use pure upstream kernel, even from stable branch, as many distro vendors adds additional fixes/improvements ABI compatibility layers and other things, so using stable longterm supported distribution is also MUST. So two things is important: longterm distribution and longterm kernel on kernel.org (yes, some vendors decides to support additional kernel branches by itself, not relying on longterm kernel maintained by one of the kernel developers, quality of such support is unknown).
3.2, 3.4 mostly same in their functionalities, 3.2 is very stable.
This might even be a good start for an optimizations page on the Bird wiki?
I do not think so, this is more related to software packet forwarding things not for BIRD as routing suite itself. Probably better places exist for such topic.
Fair point. I fully agree that it is not directly related to BIRD, but software packet forwarding is thing that's rather close to home for BIRD. Either way, you've given me plenty of food for thought already to get me through the holidays ;-)
Regards, Ruben
-- SP5474-RIPE Sergey Popovich
participants (3)
-
Ruben Laban -
Sergey Popovich -
Tigran Zakoyan