Two router ha setup questions
Hello guys, I have two identically configured routers (bird2, only the local ip is different), connected to one upstream and a few internal vlans. Each router has a BGP session with the upstream router. I have a couple of small local subnets (/27, /28, ..), which are on different vlan interfaces, which I need to announce to upstream. They all belong to one big network (/23). As the upstream router doesn't accept routes smaller than /24, I only announce a single route to the /23 network. I uploaded my bird configuration https://controlc.com/aa226135. For completeness: for first hop redundancy (default gateway of the clients using the small subnets) I use keepalived on my routers, which works fine so far. What's the problem: As both routers announce the same network to upstream, upstream sends some of its traffic to router1 and some to router2 (but doesn't seem balanced in any way). This seems to cause (at least) tcp ordering issues (I suspect, couldn't confirm it in any way), because sometime some connections from local client to clients behind the upstream "hang" for a couple of seconds. If I stop bird on the backup router (so all traffic only goes to the active one) no hangs occur. The question: Is my setup ok or is it (completely) broken? Is it ok to have both routers announce the same subnets at the same time? Possible solution I'm thinking about but don't know how to do it: Both routers should have an active BGP sessions (for fast failover), but only the router which is active (and thus has the gateway ips) should announce the "aggregated" route (the /23). This way no traffic would go over the backup. But I wonder how this could be configured in bird? I could configure keepalived to not only create the gateway ips but also the small subnets (in my current config the subnets are created by bird). Would this make things easier for bird? My last resort would be to have keepalived execute scripts which reconfigure bird on failover. But somehow this feels wrong? Or is this the way to do it? Thank you very much for reading and any help. I'm struggling with this for days now... :-( Alessandro
Hi, Is AS path prepending on backup router not an option? # example # if bgp_path ~ [= myas =] then # bgp_path.prepend(myas); # accept; if proto = "static_bgp_v4" then { bgp_path.prepend(myas); accept; } else reject; Regards, Gregor Fajdiga Sistemski administrator, Informatika System administrator, IT Delo, d.o.o. Likozarjeva 1, SI-1000 Ljubljana +386 1 4737 993 fajdiga@delo.si www.delo.si <http://www.delo.si> On 21/01/2022 15:49, Alessandro Brega wrote:
Hello guys,
I have two identically configured routers (bird2, only the local ip is different), connected to one upstream and a few internal vlans. Each router has a BGP session with the upstream router. I have a couple of small local subnets (/27, /28, ..), which are on different vlan interfaces, which I need to announce to upstream. They all belong to one big network (/23). As the upstream router doesn't accept routes smaller than /24, I only announce a single route to the /23 network. I uploaded my bird configuration https://controlc.com/aa226135.
For completeness: for first hop redundancy (default gateway of the clients using the small subnets) I use keepalived on my routers, which works fine so far.
What's the problem: As both routers announce the same network to upstream, upstream sends some of its traffic to router1 and some to router2 (but doesn't seem balanced in any way). This seems to cause (at least) tcp ordering issues (I suspect, couldn't confirm it in any way), because sometime some connections from local client to clients behind the upstream "hang" for a couple of seconds. If I stop bird on the backup router (so all traffic only goes to the active one) no hangs occur.
The question: Is my setup ok or is it (completely) broken? Is it ok to have both routers announce the same subnets at the same time?
Possible solution I'm thinking about but don't know how to do it: Both routers should have an active BGP sessions (for fast failover), but only the router which is active (and thus has the gateway ips) should announce the "aggregated" route (the /23). This way no traffic would go over the backup. But I wonder how this could be configured in bird? I could configure keepalived to not only create the gateway ips but also the small subnets (in my current config the subnets are created by bird). Would this make things easier for bird?
My last resort would be to have keepalived execute scripts which reconfigure bird on failover. But somehow this feels wrong? Or is this the way to do it?
Thank you very much for reading and any help. I'm struggling with this for days now... :-(
Alessandro
... Each router has a BGP session with the upstream router.
Would it help if you announce different MED values? That way upstream should prefer one while it's up, but fail over to the other when it's not. -- George D M Ross MSc PhD CEng MBCS CITP University of Edinburgh, School of Informatics, Appleton Tower, 11 Crichton Street, Edinburgh, Scotland, EH8 9LE Mail: gdmr@inf.ed.ac.uk Voice: 0131 650 5147 PGP: 1024D/AD758CC5 B91E D430 1E0D 5883 EF6A 426C B676 5C2B AD75 8CC5 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
On 1/21/22 8:22 AM, George Ross wrote:
Would it help if you announce different MED values? That way upstream should prefer one while it's up, but fail over to the other when it's not.
I'd be inclined to investigate different MED values too. Though I'd probably announce two /24s, which would allow you to have crossed MED values. E.g. router A has one MED for the first /24 and a second MED for the second /24 while router B has the second MED for the first /24 and the first MED for the second /24. The hope is that each /24 will have affinity to one router, thus pseudo load balancing and utilization of both links, while still being an active backup for the other /24. If your problem is an out of order / reordering issue, I'd expect that sniffing traffic on any system in the /23 would make this very apparent in Wireshark. TCP /should/ handle this without any problems. Are you by chance doing any SPI on either of the routers? Is there any chance that the timeout is actually related to traffic flows switching which router they are coming in on and the new router's SPI not having state, thus evicting the flow? -- Grant. . . . unix || die
participants (4)
-
Alessandro Brega -
George Ross -
Grant Taylor -
Gregor Fajdiga