Route aggregation in BIRD - how?
Hello. Case study: * importing full BGP table from various uplinks * some routes received by BGP are being exported via OSPF to core1, using filters: (source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =]) Question: how to aggregate routes (whenever possible) before exporting them via OSPF to core? Example: lets say that I've received a.b.c.d/22 and a.b.c.d/24 from asnXYZ via BGP. Lets say that I want to export routes with asnXYZ in aspath via ospf to core1 switch. Obviously, a.b.c.d/24 is in a.b.c.d/22, so I would like to export only a.b.c.d/22. Is it doable in bird? If yes, any hint/keywords in doc? -- * Maciej Wierzbicki * At paranoia's poison door * * VOO1-RIPE *
On 23.01.2012 20:01, Maciej Wierzbicki wrote:
Hello.
Case study: * importing full BGP table from various uplinks * some routes received by BGP are being exported via OSPF to core1, using filters: (source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting them via OSPF to core?
It is not possible currently. I'm working on BGP route aggregation and I plan to get more or less working code at the end of this week.
Example: lets say that I've received a.b.c.d/22 and a.b.c.d/24 from asnXYZ via BGP. Lets say that I want to export routes with asnXYZ in aspath via ospf to core1 switch. Obviously, a.b.c.d/24 is in a.b.c.d/22, so I would like to export only a.b.c.d/22. Is it doable in bird? If yes, any hint/keywords in doc?
On Mon, Jan 23, 2012 at 08:14:28PM +0400, Alexander V. Chernikov wrote:
On 23.01.2012 20:01, Maciej Wierzbicki wrote:
Hello.
Case study: * importing full BGP table from various uplinks * some routes received by BGP are being exported via OSPF to core1, using filters: (source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting them via OSPF to core?
It is not possible currently.
I'm working on BGP route aggregation and I plan to get more or less working code at the end of this week.
Do you plan to integrate it to the BGP protocol? I don't think it is a good idea. It would be easy to make generic route aggregation - 'virtual' protocol similar to static, which generates aggregate routes based on its config and received routes. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 23.01.2012 22:00, Ondrej Zajicek wrote:
On Mon, Jan 23, 2012 at 08:14:28PM +0400, Alexander V. Chernikov wrote:
On 23.01.2012 20:01, Maciej Wierzbicki wrote:
Hello.
Case study: * importing full BGP table from various uplinks * some routes received by BGP are being exported via OSPF to core1, using filters: (source = RTS_BGP&& bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting them via OSPF to core?
It is not possible currently.
I'm working on BGP route aggregation and I plan to get more or less working code at the end of this week.
Do you plan to integrate it to the BGP protocol? I don't think it is This is separate protocol, of course. a good idea. It would be easy to make generic route aggregation - My first idea was to implement generic aggregation protocol. However, do we really need it generic? Currently we have bunch of link-state protocols (ISIS / OSPF) which are pure singletons, and, even if not we probably don't want to make summary routes between instances. RIP[ng] is RIP(c). There are also some multicast protocols but is is far-far away. Not sure if we should permit route aggregation from different protocol types.
'virtual' protocol similar to static, which generates aggregate routes based on its config and received routes
Yes. I personally see this as following: protocol abgp agg1 { aggregate address 1.2.3.0/24; aggregate address 1.2.4.0/24 save attributes (or other keywords); # Aggregate as much attributes as possible ( see RFC 4271 9.2.2.2. ) # http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094826... aggregate address 2.3.4.5.0/24 summary only; aggregate address 3.4.5.0/24 mandatory list { 3.4.5.1/32, 3.4.5.8/29}; import filter somefilter; # Change summary route attributes } protocol bgp bgp1 { .. aggregator agg1; } Rte's matching 'summary only' instances have to be modified (no-export community have to be added to community attribute) by aggregator before passing them to rte_update Mandatory list is a list of routes which have to exist before summary route is announced. [BGP] protocols using aggregator will call rte_update_agg() instead of usual rte_update() Aggregator stores its summary and mandatory routes in modified f_trie. ( I think, there is no need to import/implement another tree if we can modify current implementation: e.g. use regular pools (flag passed to f_new_trie, along with node_size) and add trie_remove_prefix ) Possibly we have to implement some kind of lazy protocol name resolving. I mean, add all "aggregator $proto_name" entries to linked-list with file/line data and do symbol lookup after configuration parsing is finished calling modified cf_error() if lookup fails. Btw, I've got small patch from my previous approach, it moves default protocol preference to struct protocol and assigns it in proto_config_new instead of assigning it in every protocol manually. Maybe it is a good candidate for the next commit? :) -- Alexander V. Chernikov Yandex NOC
On Mon, Jan 23, 2012 at 10:30:48PM +0400, Alexander V. Chernikov wrote:
On 23.01.2012 22:00, Ondrej Zajicek wrote:
On Mon, Jan 23, 2012 at 08:14:28PM +0400, Alexander V. Chernikov wrote:
On 23.01.2012 20:01, Maciej Wierzbicki wrote:
Hello.
Case study: * importing full BGP table from various uplinks * some routes received by BGP are being exported via OSPF to core1, using filters: (source = RTS_BGP&& bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting them via OSPF to core?
It is not possible currently.
I'm working on BGP route aggregation and I plan to get more or less working code at the end of this week.
Do you plan to integrate it to the BGP protocol? I don't think it is This is separate protocol, of course. a good idea. It would be easy to make generic route aggregation - My first idea was to implement generic aggregation protocol. However, do we really need it generic? Currently we have bunch of link-state protocols (ISIS / OSPF) which are pure singletons, and, even if not we probably don't want to make summary routes between instances. RIP[ng] is RIP(c). There are also some multicast protocols but is is far-far away. Not sure if we should permit route aggregation from different protocol types.
My idea is to to make this independent of a source protocol of aggregated routes. So the question would be: Is there any advantage to make it specific? Generally, the aggregator would accept any routes and generates a new one, without any protocol specific attributes. There may be an option for processing BGP attributes and generating proper BGP attributes, but i guess this is not essential. I guess most users would want either aggregate received BGP routes to generate the default (or some more specific) routes for IGP (for this usage pattern it would be useful to have option for mandatory minimal number of routes to originate aggregate), or aggregate their IGP routes for origination to BGP.
Yes. I personally see this as following:
protocol abgp agg1 { aggregate address 1.2.3.0/24; aggregate address 1.2.4.0/24 save attributes (or other keywords); # Aggregate as much attributes as possible ( see RFC 4271 9.2.2.2. ) # http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094826... aggregate address 2.3.4.5.0/24 summary only; aggregate address 3.4.5.0/24 mandatory list { 3.4.5.1/32, 3.4.5.8/29}; import filter somefilter; # Change summary route attributes }
protocol bgp bgp1 { .. aggregator agg1; }
Rte's matching 'summary only' instances have to be modified (no-export community have to be added to community attribute) by aggregator before passing them to rte_update
What about just having an aggregator protocol connected directly to a table? Routes would be received from a table and aggregated ones are put to the same table. This would work well for BGP->IGP and IGP->BGP aggregation, where we do not care for non-aggregated routes. Not sure about BGP->BGP transit, but i guess it could also work, or aggegator could work like a pipe. Not sure if my explanation is clear enough, i could add some examples.
Aggregator stores its summary and mandatory routes in modified f_trie. ( I think, there is no need to import/implement another tree if we can modify current implementation: e.g. use regular pools (flag passed to f_new_trie, along with node_size) and add trie_remove_prefix
OK
Btw, I've got small patch from my previous approach, it moves default protocol preference to struct protocol and assigns it in proto_config_new instead of assigning it in every protocol manually. Maybe it is a good candidate for the next commit? :)
Merged. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek wrote on 2012-01-24 12:11:
I guess most users would want either aggregate received BGP routes to generate the default (or some more specific) routes for IGP (for this usage pattern it would be useful to have option for mandatory minimal number of routes to originate aggregate)
FWIW, that is exactly what I'd like to have - aggregation of some routes (based on source/transit ASN) received from BGP, and pass them aggregated via OSPF somewhere else. -- * Maciej Wierzbicki * At paranoia's poison door * * VOO1-RIPE *
Ondrej Zajicek wrote:
> On Mon, Jan 23, 2012 at 10:30:48PM +0400, Alexander V. Chernikov wrote:
>> On 23.01.2012 22:00, Ondrej Zajicek wrote:
>>> On Mon, Jan 23, 2012 at 08:14:28PM +0400, Alexander V. Chernikov wrote:
>>>> On 23.01.2012 20:01, Maciej Wierzbicki wrote:
>>>>> Hello.
>>>>>
>>>>> Case study:
>>>>> * importing full BGP table from various uplinks
>>>>> * some routes received by BGP are being exported via OSPF to core1,
>>>>> using filters:
>>>>> (source = RTS_BGP&& bgp_path ~ [= * ASNXYZ * =])
>>>>>
>>>>> Question: how to aggregate routes (whenever possible) before exporting
>>>>> them via OSPF to core?
>>>> It is not possible currently.
>>>>
>>>> I'm working on BGP route aggregation and I plan to get more or less
>>>> working code at the end of this week.
>>> Do you plan to integrate it to the BGP protocol? I don't think it is
>> This is separate protocol, of course.
>>> a good idea. It would be easy to make generic route aggregation -
>> My first idea was to implement generic aggregation protocol.
>> However, do we really need it generic?
>> Currently we have bunch of link-state protocols (ISIS / OSPF) which are
>> pure singletons, and, even if not we probably don't want to make summary
>> routes between instances. RIP[ng] is RIP(c). There are also some
>> multicast protocols but is is far-far away. Not sure if we should permit
>> route aggregation from different protocol types.
>
> My idea is to to make this independent of a source protocol of
> aggregated routes. So the question would be: Is there any advantage to
> make it specific? Generally, the aggregator would accept any routes and
> generates a new one, without any protocol specific attributes. There may
> be an option for processing BGP attributes and generating proper BGP
> attributes, but i guess this is not essential.
>
> I guess most users would want either aggregate received BGP routes to
> generate the default (or some more specific) routes for IGP (for this
> usage pattern it would be useful to have option for mandatory minimal
> number of routes to originate aggregate), or aggregate their IGP routes
> for origination to BGP.
>
>> Yes. I personally see this as following:
>>
>> protocol abgp agg1 {
>> aggregate address 1.2.3.0/24;
>> aggregate address 1.2.4.0/24 save attributes (or other keywords); #
>> Aggregate as much attributes as possible ( see RFC 4271 9.2.2.2. )
>> #
>> http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094826.shtml
>> aggregate address 2.3.4.5.0/24 summary only;
>> aggregate address 3.4.5.0/24 mandatory list { 3.4.5.1/32, 3.4.5.8/29};
>> import filter somefilter; # Change summary route attributes
>> }
>>
>> protocol bgp bgp1 {
>> ..
>> aggregator agg1;
>> }
>
>
>> Rte's matching 'summary only' instances have to be modified (no-export
>> community have to be added to community attribute) by aggregator before
>> passing them to rte_update
>
> What about just having an aggregator protocol connected directly to a
> table? Routes would be received from a table and aggregated ones are put
> to the same table. This would work well for BGP->IGP and IGP->BGP
Yes, if we assume protocol instance can aggregate any routes we can use
singe instance per rtable and add pointer to it inside struct rtable.
First disadvantage is that protocol lost last bits of knowledge about
aggregation. I mean, in this approach we will do tree checks for every
best rte regardless of protocol type. We still have to add
no-export community to matching rtes for supporting summary-only routes
(nearly the same task as in discussion about LDP.label attribute).
The second one is summarized attributes:
1) We have to announce summary rte with some source (RTS_). If we're
aggregating BGP routes it is BGP. And if we get 3 BGP and 1 OSPF ?
We can't detertmine it reliably.
2) If we try to save attributes as much as possible (AS-PATH / AS-SET,
even communities) we can end with route having both OSPF metric, for
example and BGP attributes.
So in this approach we have to change syntax to
proto aggregator agg1 {
aggregate-address bgp 1.2.3.0/24;
}
or even
proto aggregator agg1 {
bgp {
aggregate-address 1.2.3.0/24;
};
ospf {
..
}
}
> aggregation, where we do not care for non-aggregated routes. Not sure
> about BGP->BGP transit, but i guess it could also work, or aggegator
> could work like a pipe. Not sure if my explanation is clear enough, i
> could add some examples.
>
>> Aggregator stores its summary and mandatory routes in modified f_trie.
>> (
>> I think, there is no need to import/implement another tree if we can
>> modify current implementation:
>> e.g. use regular pools (flag passed to f_new_trie, along with node_size)
>> and add trie_remove_prefix
>
> OK
>
>> Btw, I've got small patch from my previous approach, it moves default
>> protocol preference to struct protocol and assigns it in
>> proto_config_new instead of assigning it in every protocol manually.
>> Maybe it is a good candidate for the next commit? :)
>
> Merged.
Thanks!
Maybe another simple patch? :)
It shows "UNNAMED" as filter name instead of <NULL> where protocol
filter is configured as input filter { ... };
(It is patch #2 from general protocol limiting patch tale)
Btw #2,
And. there is another patch that removes [nearly] all pipe hacks from
core, can you take a look at it? (I've sent both patches on 29 october
with last limiting patch version) (And I probably know now how can I fix
last FIXME)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 24/01/2012 12:11, Ondrej Zajicek a écrit :
I guess most users would want either aggregate received BGP routes to generate the default (or some more specific) routes for IGP (for this usage pattern it would be useful to have option for mandatory minimal number of routes to originate aggregate), or aggregate their IGP routes for origination to BGP.
I'd have great use of such an agregator, mainly to re-use older hardware routers with limited FIB/TCAM size. Beeing too clueless to code it, I'd just give some spec : - - Take a full BGP table as an input - - Provides customizable aggregation to a target route count - - May agregate on the following criterias : * Same AS-path (with or without stripping prepending) * Same next-hop * Every attributes matching (only the prefix lenght differs but the longer is contained in the shorter, therefore only the less specific prefix must remain) Agregation could be destructive or not, meaning it will default to a lower possible route-count without discarding any routing-policy, but it could be used to reduce the route-count to less than 32k routes, and therefore only match on the next-hop attribute, not taking care of the originating AS (using upstream's), considering the best path selection already occured in the source table. BGP feeds -> single BGP RIB -> agregator -> minimized RIB -> iBGP to HWR (best paths selected) The 32k route limit is intended to use a routing switch as a faster forwarding plane that a small X86 box could have. Session between the agregated table and the routing switch could be established using either iBGP, eBGP on private AS, RIP or OSPF (for L3 switches with no BGP support). Some other decent hardware (Fondry BigIron or Cisco SUP32) may accomodate up to 256k routes approx and could really benefit from such feature, getting them back to usable state for a quarter of their 1M routes TCAM counterparts' price tag. It looks like a better solution to me than filtering to /21, adding a few default routes and hoping for the best. The major issue with such setup would be to accomodate upstream's requirements (BGP session is usualy end-to-end, on-band, and this new setup requires eBHP multihop or NAT on the data-plane to work). Maybe this could also be a path to using BIRD as an OpenFlow control-plane ? best regards, - -- Jérôme Nicolle +33 (0)6 19 31 27 14 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlAr43AACgkQbt+nwQamihunfgCdFPSbmKw9pqwxvFrJ0B4BIUkK 6lYAn3yktirN4ZWazuUaVHyDeFatx0bP =QBwI -----END PGP SIGNATURE-----
On Wed, Aug 15, 2012 at 07:59:18PM +0200, Jérôme Nicolle wrote:
- May agregate on the following criterias : * Same AS-path (with or without stripping prepending) * Same next-hop * Every attributes matching (only the prefix lenght differs but the longer is contained in the shorter, therefore only the less specific prefix must remain)
Don't really understand this. For that purpose you could aggregate everything with the same next-hop, other attributes are irrelevant. We could think about it not as an BGP route aggregation, but as a generation of completely new table that is more compact but equivalent w.r.t. packet forwarding.
Agregation could be destructive or not, meaning it will default to a lower possible route-count without discarding any routing-policy, but it could be used to reduce the route-count to less than 32k routes, and therefore only match on the next-hop attribute, not taking care of the originating AS (using upstream's), considering the best path selection already occured in the source table.
BGP feeds -> single BGP RIB -> agregator -> minimized RIB -> iBGP to HWR (best paths selected)
The 32k route limit is intended to use a routing switch as a faster forwarding plane that a small X86 box could have. Session between the agregated table and the routing switch could be established using either iBGP, eBGP on private AS, RIP or OSPF (for L3 switches with no BGP support).
Optimal solution would be iBGP to distribute aggregated table and OSPF running on these routers/L3 switches as an IGP. OSPF could be probably also used, but that would have some problems (esp. OSPF does not work very well with too many LSAs - slow propagation because of simple send-acknowledge model). BTW, do you have an idea what capabilities these 'cheaper' L3 switches usually have? (i.e. how many routes they support, and what routing protocols they support - i have no experience with them and i heard that these usually support just static routes)
Some other decent hardware (Fondry BigIron or Cisco SUP32) may accomodate up to 256k routes approx and could really benefit from such feature, getting them back to usable state for a quarter of their 1M routes TCAM counterparts' price tag. It looks like a better solution to me than filtering to /21, adding a few default routes and hoping for the best.
The major issue with such setup would be to accomodate upstream's requirements (BGP session is usualy end-to-end, on-band, and this new setup requires eBHP multihop or NAT on the data-plane to work).
This could be probably handled by having BGP X86 router 'outside' of network (i.e. on L2 network connecting upstream and 'fast' router), that could be done if your 'fast' router / L3 switch allows you to bridge ports to upstream and to BGP router. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 15/08/2012 22:04, Ondrej Zajicek a écrit :
Don't really understand this. For that purpose you could aggregate everything with the same next-hop, other attributes are irrelevant. We could think about it not as an BGP route aggregation, but as a generation of completely new table that is more compact but equivalent w.r.t. packet forwarding.
You're right, the simplier approach would be to agregate only on the next-hop and strip every other atributes before sending to the downstream router. Still, it'd be usefull to try not to strip the originating AS in order to use NetFlow agregation for trafic statistic per-AS. Obviously this isn't possible on a 32k maximum route limit, it should still be possible on a 200k route limit. Even so, if the route-limit is reached, then stub ASes could be stripped, leaving only source's transits in the path. A transit ASes list coul be fed to the algorithm, in order to strip ASes reached via these transits.
BTW, do you have an idea what capabilities these 'cheaper' L3 switches usually have? (i.e. how many routes they support, and what routing protocols they support - i have no experience with them and i heard that these usually support just static routes)
Typical switch would be a Cisco 3560G our 3750G : approx 32k routes and BGP support. Lower end switches (HP, BDCOM) would lack a proper BGP support (if any), but RIP could be used instead. Just to clarify : a Cisco SUP32-8GE (intended for a 6500 series chassis) is approx 650€ from a decent broker. SUP720-3B is 100€ more. It's basically a castrated SUP720-3BXL (approx 2500€) : same forwarding capacity, only a smaller TCAM. With 8 integrated GE ports and 225€ for a 16 GBIC linecard, it gives us a 40 port 32Gbps capable router for approx 1500€ Foundry's BigIron 4000 with jetcore route processors are still capable of forwarding 20Gbps but can't stand more than 200k routes approx. I saw most of them used as door-stops recently, one with a 16x1G SFP can be found for less than 1000€. So the goal could be to let small ISPs use these boxes to scale bandwidth-wise from a software-routed network, without paying a few extra k€ for up-to-date cisco gear. Discociating data and control plane is also the only way to protect against DDoS.
This could be probably handled by having BGP X86 router 'outside' of network (i.e. on L2 network connecting upstream and 'fast' router), that could be done if your 'fast' router / L3 switch allows you to bridge ports to upstream and to BGP router.
I see two possible setups, the proper one requiering a specific configuration from the peer. The "proper" way would be to establish two interconnexion (at least two subnets on the same L2) : one for the session to the BIRD router, the other for actual forwarding to the L3 switch. Most L3 switches are either L2 on L3 to one port, so it'll require a basic L2 switch between the peer and the router and L3 switch. The "dirty" way would be to use a NAPT capable switch/router, to redirect TCP/179 originating from the peer to the software router. This should be supported on Cisco 6500 series. I need to get my hands on a SUP32 to check this. - -- Jérôme Nicolle 06 19 31 27 14 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlAsEecACgkQbt+nwQamihtgdQCcCaUs4Ws16/OT4HZHhzcxu4Vn VWIAnjk007ia3GYIp219JNy9FsHm/fLi =hnUp -----END PGP SIGNATURE-----
On 16.08.2012 01:17, Jérôme Nicolle wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Le 15/08/2012 22:04, Ondrej Zajicek a écrit :
Don't really understand this. For that purpose you could aggregate everything with the same next-hop, other attributes are irrelevant. We could think about it not as an BGP route aggregation, but as a generation of completely new table that is more compact but equivalent w.r.t. packet forwarding. You're right, the simplier approach would be to agregate only on the next-hop and strip every other atributes before sending to the downstream router.
Still, it'd be usefull to try not to strip the originating AS in order to use NetFlow agregation for trafic statistic per-AS. Obviously this isn't possible on a 32k maximum route limit, it should still be possible on a 200k route limit. For 32k or less it seems that several static defaults for outgoing traffic balancing is sufficient, since it is not possible to do fine-grained route selection for all routes. 200k is much more promising: 11:19 [0] bmw# birdc 'show route primary where net.len < 24' | wc -l 198369
It is possible to fetch RIPE/APNIC/etc (or RADB) database on route objects, filter more-specific route object, and build long Aggregator {} configuration with "save attributes" part. It will probably fit it less than, say, 150k and the rest can be used for IGP and local IX tables. Some (how much?) of the aggregated routes will have different paths resulting in origin AS being hidden in AS_SET attribute. If this number is significant I can improve AS_PATH merging algorithm to save originating AS if possible. However, 1) it needs testing to determine exact number 2) In future, IPv4 sub-blocks selling between ISPs (and non-ISPs) will decrease effectiveness of such approach.
Even so, if the route-limit is reached, then stub ASes could be stripped, leaving only source's transits in the path. A transit ASes list coul be fed to the algorithm, in order to strip ASes reached via these transits.
BTW, do you have an idea what capabilities these 'cheaper' L3 switches usually have? (i.e. how many routes they support, and what routing protocols they support - i have no experience with them and i heard that these usually support just static routes) Typical switch would be a Cisco 3560G our 3750G : approx 32k routes and BGP support. Lower end switches (HP, BDCOM) would lack a proper BGP support (if any), but RIP could be used instead.
Just to clarify : a Cisco SUP32-8GE (intended for a 6500 series chassis) is approx 650€ from a decent broker. SUP720-3B is 100€ more. It's basically a castrated SUP720-3BXL (approx 2500€) : same forwarding capacity, only a smaller TCAM. With 8 integrated GE ports and 225€ for a 16 GBIC linecard, it gives us a 40 port 32Gbps capable router for approx 1500€
Foundry's BigIron 4000 with jetcore route processors are still capable of forwarding 20Gbps but can't stand more than 200k routes approx. I saw most of them used as door-stops recently, one with a 16x1G SFP can be found for less than 1000€.
So the goal could be to let small ISPs use these boxes to scale bandwidth-wise from a software-routed network, without paying a few extra k€ for up-to-date cisco gear. Discociating data and control plane is also the only way to protect against DDoS.
This could be probably handled by having BGP X86 router 'outside' of network (i.e. on L2 network connecting upstream and 'fast' router), that could be done if your 'fast' router / L3 switch allows you to bridge ports to upstream and to BGP router. I see two possible setups, the proper one requiering a specific configuration from the peer.
The "proper" way would be to establish two interconnexion (at least two subnets on the same L2) : one for the session to the BIRD router, the other for actual forwarding to the L3 switch. Most L3 switches are either L2 on L3 to one port, so it'll require a basic L2 switch between the peer and the router and L3 switch.
The "dirty" way would be to use a NAPT capable switch/router, to redirect TCP/179 originating from the peer to the software router. This should be supported on Cisco 6500 series. I need to get my hands on a SUP32 to check this.
- -- Jérôme Nicolle 06 19 31 27 14 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAlAsEecACgkQbt+nwQamihtgdQCcCaUs4Ws16/OT4HZHhzcxu4Vn VWIAnjk007ia3GYIp219JNy9FsHm/fLi =hnUp -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 16/08/2012 09:40, Alexander V. Chernikov a écrit :
For 32k or less it seems that several static defaults for outgoing traffic balancing is sufficient, since it is not possible to do fine-grained route selection for all routes. 200k is much more promising: 11:19 [0] bmw# birdc 'show route primary where net.len < 24' | wc -l 198369
bird> show route count where net.len < 24 197699 of 416648 routes for 416648 networks confirmed. More on that : http://dedibox.nicolbolas.org/tmp/prefix-distribution.png But I wouldn't count that much on prefix-lenght based agregation : many /24s are stubs or PI space used to multi-home smaller networks.
It is possible to fetch RIPE/APNIC/etc (or RADB) database on route objects, filter more-specific route object, and build long Aggregator {} configuration with "save attributes" part. It will probably fit it less than, say, 150k and the rest can be used for IGP and local IX tables. Some (how much?) of the aggregated routes will have different paths resulting in origin AS being hidden in AS_SET attribute. If this number is significant I can improve AS_PATH merging algorithm to save originating AS if possible.
I wouldn't count on route objects, many are outdated, if any. But cross-checking them could be a nice security feature, appart from aggregation purposes. Could be linked to a developpment targeted at full RPKI+ROA validator support in BIRD.
However, 1) it needs testing to determine exact number 2) In future, IPv4 sub-blocks selling between ISPs (and non-ISPs) will decrease effectiveness of such approach.
You're right. But I consider it to be of minimal importance in opposition to poorly managed ASes (AS6389 and AS28573 anyone ?) On a purely algorithmic approach, I'd propose the following strategies. Keep in mind this happens _after_ the best path selection process in BIRD. 1) Strip prepending from AS_PATHs while copying routes in a B-tree 2) When creating a child in the B-tree, if the more specific has the same AS_path (including Origin) than its parent, don't create it. (note : when creating a child with no direct parent, you'd have to create a virtual parent, hence marked as a non-existent route) 3) if a virtual parent has both clildren with the same AS path, strip the children (don't get me wrong on that) and move their attributes to the virtual parent, making it a "real" route. There you get a naturally aggregated B-tree representing all routes with a non-destructive FIB approach. If the route count is still too high, you could do the following : 4) Build a secondary tree (not binary) representing AS relationships, each node having a pointer to a list of pointer to routes known in the route B-Tree 5) For "blacklisted ASes", meaning "known transit ASes with no special interest in their customer cone", strip AS paths to make routes originating from the listed AS and agregate the routes (hence the pointer to list of pointers : way faster than recursing the B-tree) OR (if you don't want to handle a blacklist, or in addition to it) 5bis) Shorten as_path to a maximum hop count, let's say 5 would be a reasonable default limit for a well-connected network, and do the same than 5). This strategy could be applied locally to large non-leaf nodes in the AS-tree (ASes with more than, let's say, a few hundreds of customer ASes) for maximum profit. On the contrary, it could also be used to force the agregation of ASes with smaller customer cone, considering them as marginal. 5ter) Same strategy as 5 and 5bis could be applied based on AS-sets. (at this point it's still not really destructive regarding to respect of the routing policy) 6) Recursing the AS-tree will let you find ASes with the largest route-pointer lists, hence ASes with the largest route count. Counting topmost entries in the B-tree will give you an approximate "potential aggregation factor" disregarding the next-hop. There you may take the step of overriding the next-hop to maximise agregation for that AS. This I consider destructive. 7) Discrepency hunting : on a given route-B-tree branch, when two "colors" (I first thought of it as colorizin the tree based on the next-hop attribute) are highly dominant, and both AS groups (given the initial best path selection policy), are reachable through both "colors" (here again, read "next-hop"), then take the topmost node in the route-B-tree and split it in half, each half with one hardcoded color. Then mark them with private ASNs and write matching AS-sets to the log output for debugging and trafic statistics purpose. Many other policies could be writtent, I think it'd be all about try-and-errors regarding their aggregation efficiency. About the data structures, I thought of a B-tree but it could be quaternary too. The most important thing would be to implement a copy-on-write approach to routes : the initial tree has to be built with pointers to the routes as stored in the initial table, it'd be faster and more conservative (memory-wise). When a modification happens to a route (attribute modification in the agregation process), you will have to copy the modified route to a ne memory space and rewrite the attribute AS_paths attributes might be calculated from the AS-tree rather than stored in litteral form. I guess this could be faster than stripping prepending from the attribute string. At this point, we don't seem to need any other attribute than prefix, next-hop and AS_path. Community, MED and extended attributes might as well be stripped in the modified instances of the routes. Those might be ignored as well, and maybe the size of the resulting tree could disqualify the copy-on-write approach due to its higher complexity in regard to a lower interest in saving memory. - -- Jérôme Nicolle 06 19 31 27 14 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlAsyB8ACgkQbt+nwQamihvMVACgj1RcbrjPh74XoNYBGLJhPKbd RbkAoIep00fWGJzFBM0DGQWrVJB1aTq5 =bPKd -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 By the way, if anyone can help me to implement this (meaning writing code while I do the testing and debugging), my usual broker (Alturna) offered me a free test chassis to actually run the agregation process and see how a SUP32 or SUP720 reacts in real world. I have 2 to 4 possible transit upstreams to test with in my area, so basically we have every costy ressources at our disposal, I just really need a talented developper to finally create a functionnal route agregator. Anyone ? Le 16/08/2012 12:14, Jérôme Nicolle a écrit :
Le 16/08/2012 09:40, Alexander V. Chernikov a écrit :
For 32k or less it seems that several static defaults for outgoing traffic balancing is sufficient, since it is not possible to do fine-grained route selection for all routes. 200k is much more promising: 11:19 [0] bmw# birdc 'show route primary where net.len < 24' | wc -l 198369
bird> show route count where net.len < 24 197699 of 416648 routes for 416648 networks
confirmed. More on that : http://dedibox.nicolbolas.org/tmp/prefix-distribution.png
But I wouldn't count that much on prefix-lenght based agregation : many /24s are stubs or PI space used to multi-home smaller networks.
It is possible to fetch RIPE/APNIC/etc (or RADB) database on route objects, filter more-specific route object, and build long Aggregator {} configuration with "save attributes" part. It will probably fit it less than, say, 150k and the rest can be used for IGP and local IX tables. Some (how much?) of the aggregated routes will have different paths resulting in origin AS being hidden in AS_SET attribute. If this number is significant I can improve AS_PATH merging algorithm to save originating AS if possible.
I wouldn't count on route objects, many are outdated, if any. But cross-checking them could be a nice security feature, appart from aggregation purposes. Could be linked to a developpment targeted at full RPKI+ROA validator support in BIRD.
However, 1) it needs testing to determine exact number 2) In future, IPv4 sub-blocks selling between ISPs (and non-ISPs) will decrease effectiveness of such approach.
You're right. But I consider it to be of minimal importance in opposition to poorly managed ASes (AS6389 and AS28573 anyone ?)
On a purely algorithmic approach, I'd propose the following strategies. Keep in mind this happens _after_ the best path selection process in BIRD.
1) Strip prepending from AS_PATHs while copying routes in a B-tree
2) When creating a child in the B-tree, if the more specific has the same AS_path (including Origin) than its parent, don't create it.
(note : when creating a child with no direct parent, you'd have to create a virtual parent, hence marked as a non-existent route)
3) if a virtual parent has both clildren with the same AS path, strip the children (don't get me wrong on that) and move their attributes to the virtual parent, making it a "real" route.
There you get a naturally aggregated B-tree representing all routes with a non-destructive FIB approach.
If the route count is still too high, you could do the following :
4) Build a secondary tree (not binary) representing AS relationships, each node having a pointer to a list of pointer to routes known in the route B-Tree
5) For "blacklisted ASes", meaning "known transit ASes with no special interest in their customer cone", strip AS paths to make routes originating from the listed AS and agregate the routes (hence the pointer to list of pointers : way faster than recursing the B-tree)
OR (if you don't want to handle a blacklist, or in addition to it)
5bis) Shorten as_path to a maximum hop count, let's say 5 would be a reasonable default limit for a well-connected network, and do the same than 5). This strategy could be applied locally to large non-leaf nodes in the AS-tree (ASes with more than, let's say, a few hundreds of customer ASes) for maximum profit. On the contrary, it could also be used to force the agregation of ASes with smaller customer cone, considering them as marginal.
5ter) Same strategy as 5 and 5bis could be applied based on AS-sets.
(at this point it's still not really destructive regarding to respect of the routing policy)
6) Recursing the AS-tree will let you find ASes with the largest route-pointer lists, hence ASes with the largest route count. Counting topmost entries in the B-tree will give you an approximate "potential aggregation factor" disregarding the next-hop. There you may take the step of overriding the next-hop to maximise agregation for that AS. This I consider destructive.
7) Discrepency hunting : on a given route-B-tree branch, when two "colors" (I first thought of it as colorizin the tree based on the next-hop attribute) are highly dominant, and both AS groups (given the initial best path selection policy), are reachable through both "colors" (here again, read "next-hop"), then take the topmost node in the route-B-tree and split it in half, each half with one hardcoded color. Then mark them with private ASNs and write matching AS-sets to the log output for debugging and trafic statistics purpose.
Many other policies could be writtent, I think it'd be all about try-and-errors regarding their aggregation efficiency.
About the data structures, I thought of a B-tree but it could be quaternary too. The most important thing would be to implement a copy-on-write approach to routes : the initial tree has to be built with pointers to the routes as stored in the initial table, it'd be faster and more conservative (memory-wise). When a modification happens to a route (attribute modification in the agregation process), you will have to copy the modified route to a ne memory space and rewrite the attribute
AS_paths attributes might be calculated from the AS-tree rather than stored in litteral form. I guess this could be faster than stripping prepending from the attribute string.
At this point, we don't seem to need any other attribute than prefix, next-hop and AS_path. Community, MED and extended attributes might as well be stripped in the modified instances of the routes. Those might be ignored as well, and maybe the size of the resulting tree could disqualify the copy-on-write approach due to its higher complexity in regard to a lower interest in saving memory.
- -- Jérôme Nicolle 06 19 31 27 14 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlAyVTYACgkQbt+nwQamihv1vQCfX2L283xGrzme8A9zv5pbFmid /KEAnRoEQPfAe9puHDEURdumc1/Mu+y1 =7W0I -----END PGP SIGNATURE-----
Isn't there a case when you DON'T want the links aggregated? If you have servers in 2 different data centers, both announcing a 1.1.1.1/20, but one also announces 1.1.1.1/22, and the other also announces 1.1.5.1/22, so the one data center tends to serve 1.1.1.1-1.1.4.255, and the other data center tends to serve 1.1.5.1-1.1.8.255, but if either data center goes down, the whole address range will be served by the other data center? I don't have this setup, but was considering it at some point. More of a curiosity question. -----Original Message----- From: "Alexander V. Chernikov" <melifaro@yandex-team.ru> Sent: Monday, January 23, 2012 11:14am To: "Maciej Wierzbicki" <voovoos-bird@killfile.pl> Cc: bird-users@network.cz Subject: Re: Route aggregation in BIRD - how? On 23.01.2012 20:01, Maciej Wierzbicki wrote:
Hello.
Case study: * importing full BGP table from various uplinks * some routes received by BGP are being exported via OSPF to core1, using filters: (source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting them via OSPF to core?
It is not possible currently. I'm working on BGP route aggregation and I plan to get more or less working code at the end of this week.
Example: lets say that I've received a.b.c.d/22 and a.b.c.d/24 from asnXYZ via BGP. Lets say that I want to export routes with asnXYZ in aspath via ospf to core1 switch. Obviously, a.b.c.d/24 is in a.b.c.d/22, so I would like to export only a.b.c.d/22. Is it doable in bird? If yes, any hint/keywords in doc?
Le 23/01/2012 19:11, dspazman@epicup.com a écrit :
Isn't there a case when you DON'T want the links aggregated? If you have servers in 2 different data centers, both announcing a 1.1.1.1/20, but one also announces 1.1.1.1/22, and the other also announces 1.1.5.1/22, so the one data center tends to serve 1.1.1.1-1.1.4.255, and the other data center tends to serve 1.1.5.1-1.1.8.255, but if either data center goes down, the whole address range will be served by the other data center?
On a user's perspective, I guess it has no difference to hit one datacenter or the other. Seeing only the shortest prefix will make your upstream choose wich one to hit. -- Jérôme Nicolle +33 (0)6 19 31 27 14
participants (6)
-
Alexander V. Chernikov -
Alexander V. Chernikov -
dspazman@epicup.com -
Jérôme Nicolle -
Maciej Wierzbicki -
Ondrej Zajicek