BGP + routing w/ multiple providers/uplinks
Despite a few days of troubleshooting, I'm at a loss for answers, and Google is currently mocking further research attempts. Any help is greatly appreciated. My goals & confusion: - The intent is to announce two /24 netblocks across both of two separate uplinks, w/ my ASN via BGP. - It's not clear if I need to maintain multiple routing tables, with a single "internal" autonomous system, and if so, how to facilitate that with bird's pipe function. - I believe I should be focusing on ensuring replies return via the originating interface - however at this point, another set of eyes would really help in hashing this out. From the perspective of my router: - BGP sessions are active w/ uplinks, and providers are not filtering announcements. - ingress TCP connections to router's "source address" IP are established with no problem, if my BGP/routing tables instruct replies via the same interface packets happen to originate via. - ingress TCP connections cannot be established, if my BGP/routing tables instruct replies via a different interface than what packets happen to originate via. - tcpdump shows packets ingress from 1st provider, and egress towards 2nd provider, only if routing table instructs to 2nd provider...although replies never arrive at destination. - outbound TCP connections originating from the router itself, can be established with no issue, to either provider. Source IP matches local address of outbound route. - All addresses are public; this router is not doing any NAT. Below IPs were changed to protect the innocent, guilty, and everyone in-between - all replacements were via sed -i s///g... Since I've had no luck finding a template detailing how to accomplish this, once this hashes out, I'll followup with a relevant & documented configuration template. My bird.conf: log syslog { debug, trace, info, remote, warning, error, auth, fatal, bug }; # Avoid martians from planet RFC1918 function avoid_martians() prefix set martians; { martians = [ 169.254.0.0/16+, 172.16.0.0/12+, 192.168.0.0/16+, 10.0.0.0/8+, 224.0.0.0/4+, 240.0.0.0/4+, 0.0.0.0/32-, 0.0.0.0/0{31,32}, 0.0.0.0/0{0,7} ]; if net ~ martians then return false; return true; } protocol direct { } protocol kernel { learn; # Learn all alien routes from the kernel persist; # Don't remove routes on bird shutdown scan time 60; # Scan kernel routing table every 60 seconds import none; # Default is import all export all; # Default is export none } protocol device { scan time 60; # Scan interfaces every 60 seconds } protocol static { route 1.0.0.0/24 via "eth0"; route 2.0.0.0/24 via "eth0"; } filter bgp_out { if net ~ [2.0.0.0/24] then accept; if net ~ [1.0.0.0/24] then accept; reject; } protocol bgp INET1 { local as 11111; source address 3.0.0.118; neighbor 3.0.0.117 as 22222; import all; path metric 1; # Prefer routes with shorter paths (like Cisco does) default bgp_med 1; # MED value we use for comparison when none is defined default bgp_local_pref 100; # The same for local preference export filter bgp_out; } protocol bgp INET2 { local as 11111; source address 4.0.0.14; neighbor 4.0.0.13 as 33333; import all; path metric 1; # Prefer routes with shorter paths (like Cisco does) default bgp_med 1; # MED value we use for comparison when none is defined default bgp_local_pref 100; # The same for local preference export filter bgp_out; } As for any possible "alien" routes, the only thing I'm doing is adding the spamhaus DROP list, via this script: ip route flush table 10; ip rule del from all table 10 priority 10; curl http://www.spamhaus.org/drop/drop.lasso | sed 's/;/#/g' | sed 's/^[0-9]/ip\ route\ add\ blackhole\ &/g' | sed 's/ \#/ table 10 \#/' |sed '/^ip/!D' > spamhausDrop; ip rule add from all table 10 priority 10; sh spamhausDrop In summary, it sources http://www.spamhaus.org/drop/drop.lasso to generate a bunch of lines akin to: ip route add blackhole 95.64.98.0/23 table 10 # SBL90817 - Gregg Berkholtz Dependable IT consulting, hosting & support since 1995 www.tocici.com | 503-488-5461 | AS14613
* Gregg Berkholtz
My goals & confusion: - The intent is to announce two /24 netblocks across both of two separate uplinks, w/ my ASN via BGP. - It's not clear if I need to maintain multiple routing tables, with a single "internal" autonomous system, and if so, how to facilitate that with bird's pipe function. - I believe I should be focusing on ensuring replies return via the originating interface - however at this point, another set of eyes would really help in hashing this out.
Hi Gregg, BGP can't really help you here. A BGP router only considers the destination field of the IP packet when forwarding it. It doesn't know if it is a "reply packet" - from the router's point of view, it's just a packet like any other. Asymmetric routing like this is pretty normal, and there's seldom any reason to try to avoid it. If you have to anyway, you will have to look into policy-based routing to replace or supplement BGP. But I'd recommend against it if you have a choice, because of the added complexity.
From the perspective of my router: - BGP sessions are active w/ uplinks, and providers are not filtering announcements. - ingress TCP connections to router's "source address" IP are established with no problem, if my BGP/routing tables instruct replies via the same interface packets happen to originate via. - ingress TCP connections cannot be established, if my BGP/routing tables instruct replies via a different interface than what packets happen to originate via. - tcpdump shows packets ingress from 1st provider, and egress towards 2nd provider, only if routing table instructs to 2nd provider...although replies never arrive at destination. - outbound TCP connections originating from the router itself, can be established with no issue, to either provider. Source IP matches local address of outbound route. - All addresses are public; this router is not doing any NAT.
Below IPs were changed to protect the innocent, guilty, and everyone in-between - all replacements were via sed -i s///g...
In my opinion, that's a bit silly. Leaving out information just makes it harder to help you figure out what's going on, and in your case I think specific information is essential. BGP announcements and such are inherently public information anyway. So I just looked it up... Your two /24s are 65.49.94.0/24 and 199.223.127.0/24, and that one of the providers are AS6939 (Hurricane Electric), correct? You might have problems using 65.49.94.0/24 with another provider, as these addresses belong to HE and provider #2 might drop them as an anti-spoof measure. You might have to provide them with a LOA from HE before they'll allow the traffic. Similarly, 199.223.127.0/24 appears to belong to Stephouse Networks if I'm reading ARIN's whois database correctly. Is this your provider #2? You said your providers aren't filtering announcements, but I can not see any routes originating from AS14613 that are not via AS6939. Are you 100% certain that provider #2 doesn't filter your announcements? If it is, and furthermore is running uRPF in strict mode on your transit port, any packets sent to that interface will be dropped. In any case, if you want to multihome, I'd recommend against borrowing address space from your upstreams. You won't be truly provider-independent if you do, and you'll be running into these filtering issues from time to time. If I were you I'd instead go and get your very own PA allocation directly from ARIN, or if you're not an ARIN member, get one of your upstreams to request a PI assignment on your behalf. Best regards, -- Tore Anderson Redpill Linpro AS - http://www.redpill-linpro.com/
On Aug 9, 2010, at 1:01 AM, Tore Anderson wrote:
* Gregg Berkholtz
My goals & confusion: - The intent is to announce two /24 netblocks across both of two separate uplinks, w/ my ASN via BGP. - It's not clear if I need to maintain multiple routing tables, with a single "internal" autonomous system, and if so, how to facilitate that with bird's pipe function. - I believe I should be focusing on ensuring replies return via the originating interface - however at this point, another set of eyes would really help in hashing this out.
Hi Gregg,
Thanks for the reply Tore,
BGP can't really help you here. A BGP router only considers the destination field of the IP packet when forwarding it. It doesn't know if it is a "reply packet" - from the router's point of view, it's just a packet like any other.
BGP's behavior I understood. What I was getting at was ensuring that two kernel routing tables are not necessary, and if so, seeking a way to configure BIRD to aid in building a split-access style routing configuration (similar to what I've done with static routes in the past: http://lartc.org/howto/lartc.rpdb.multiple-links.html ) What I'm hearing from you, is that my configuration is sane-enough & that it /should/ be working.
Asymmetric routing like this is pretty normal, and there's seldom any reason to try to avoid it.
I thought this was the case. Thank you - I'm not going insane. :-)
If you have to anyway, you will have to look into policy-based routing to replace or supplement BGP. But I'd recommend against it if you have a choice, because of the added complexity.
KISS - that level of complexity should not be necessary on this particular network.
From the perspective of my router: - BGP sessions are active w/ uplinks, and providers are not filtering announcements. - ingress TCP connections to router's "source address" IP are established with no problem, if my BGP/routing tables instruct replies via the same interface packets happen to originate via. - ingress TCP connections cannot be established, if my BGP/routing tables instruct replies via a different interface than what packets happen to originate via. - tcpdump shows packets ingress from 1st provider, and egress towards 2nd provider, only if routing table instructs to 2nd provider...although replies never arrive at destination. - outbound TCP connections originating from the router itself, can be established with no issue, to either provider. Source IP matches local address of outbound route. - All addresses are public; this router is not doing any NAT.
Below IPs were changed to protect the innocent, guilty, and everyone in-between - all replacements were via sed -i s///g...
In my opinion, that's a bit silly. Leaving out information just makes it harder to help you figure out what's going on, and in your case I think specific information is essential. BGP announcements and such are inherently public information anyway. So I just looked it up...
It was silly, although it was more intended to minimize grief, from someone copy/pasting a configuration while building their own. Although I suppose at this level, nobody /should/ be doing that. Thank you for taking the time to dig and followup.
Your two /24s are 65.49.94.0/24 and 199.223.127.0/24, and that one of the providers are AS6939 (Hurricane Electric), correct? You might have problems using 65.49.94.0/24 with another provider, as these addresses belong to HE and provider #2 might drop them as an anti-spoof measure. You might have to provide them with a LOA from HE before they'll allow the traffic. Similarly, 199.223.127.0/24 appears to belong to Stephouse Networks if I'm reading ARIN's whois database correctly. Is this your provider #2?
Yep, you've got it. LOAs have all been handled; both providers claim they're seeing my announcements and that they're not filtering. I could have sworn confirming this a while ago...silly me for assuming that'd stick for longer than a few weeks.
You said your providers aren't filtering announcements, but I can not see any routes originating from AS14613 that are not via AS6939. Are you 100% certain that provider #2 doesn't filter your announcements? If it is, and furthermore is running uRPF in strict mode on your transit port, any packets sent to that interface will be dropped.
Well, they're not /supposed/ to be filtering the announcements. Although this particular provider has done all kinds of fun things lately, like broadly limiting the number of outbound TCP connections per minute, for all of their colo cross-connect customers...despite no basis/history to justify restrictions, and only until one "discovers" it & begs for the restrictions to be lifted. I can't even begin to convey how much fun my systems monitoring customers had with that stunt. Anyway - tcpdump very clearly shows the packets going out that interface & with no firewalling on my end...having uRPF set incorrectly for this circuit would be right in line with a growing trend of past behaviors.
In any case, if you want to multihome, I'd recommend against borrowing address space from your upstreams. You won't be truly provider-independent if you do, and you'll be running into these filtering issues from time to time. If I were you I'd instead go and get your very own PA allocation directly from ARIN, or if you're not an ARIN member, get one of your upstreams to request a PI assignment on your behalf.
Excellent point - alongside the sourcing of another uplink, we should be shifting the priority of this project up. Thanks again, Gregg Berkholtz Dependable IT consulting, hosting & support since 1995 www.tocici.com | 503-488-5461 | AS14613
participants (2)
-
Gregg Berkholtz -
Tore Anderson