Understanding IPv6 next hops
Good morning all! I'm debugging a situation where I'm seeing different IPv6 next hop behaviour in two setups with different versions of my team's software. In both setups: There are 3 routers A, B and C, all peered with another router X. They are all on the same L2 bridge, and have global IPv6 addresses in the 2001:20::/64 subnet. A, B and C all export a route for fd00:10:96::/112 In setup #1 on X I see ECMP routes for fd00:10:96::/112 via link-local IPv6 addresses: # ip -6 r ... fd00:10:96::/112 proto bird metric 1024 nexthop via fe80::42:acff:fe11:2 dev eth0 weight 1 nexthop via fe80::42:acff:fe11:3 dev eth0 weight 1 nexthop via fe80::42:acff:fe11:4 dev eth0 weight 1 nexthop via fe80::42:acff:fe11:5 dev eth0 weight 1 pref medium But in setup #2 on X I see ECMP routes for fd00:10:96::/112 via the global IPv6 addresses: # ip -6 r ... fd00:10:96::/112 proto bird metric 1024 nexthop via 2001:20::1 dev eth0 weight 1 nexthop via 2001:20::2 dev eth0 weight 1 nexthop via 2001:20::3 dev eth0 weight 1 nexthop via 2001:20::8 dev eth0 weight 1 pref medium That is the difference that I am trying to understand. Digging further, I used tcpdump to capture the BGP protocol on X, and in setup #1 I see nexthop: 2001:20::3, fe80::42:acff:fe11:4, nh-length: 32, no SNPA in the BGP Update Reach NLRI, whereas in setup #2 I see just one next hop address: nexthop: 2001:20::3, nh-length: 16, no SNPA All this is with BIRD code, the same for both setups, that is 1.6.8 plus some patches that I would not expect to be relevant. (Specifically, this code: https://github.com/projectcalico/bird/commits/feature-ipinip) There are slight config differences between the two setups, but nothing that is obviously relevant to this. (I can provide those if need be.) Any ideas? Can you advise where I should look or check next, to try to understand why the UPDATE message has two next hop addresses in one setup, but only one in the other? Also, does the passing of two next hop addresses in setup #1 fully explain why the ECMP routes programmed into the kernel use link-local gateway addresses? Also, are the routes with global next hops more correct in some sense than those with link-local next hops; or vice versa? Would you expect them both to forward data correctly? Many thanks, Neil
On Thu, Feb 06, 2020 at 12:34:00PM +0000, Neil Jerram wrote:
Good morning all!
I'm debugging a situation where I'm seeing different IPv6 next hop behaviour in two setups with different versions of my team's software.
In both setups: There are 3 routers A, B and C, all peered with another router X. They are all on the same L2 bridge, and have global IPv6 addresses in the 2001:20::/64 subnet. A, B and C all export a route for fd00:10:96::/112 ... Any ideas? Can you advise where I should look or check next, to try to understand why the UPDATE message has two next hop addresses in one setup, but only one in the other?
Hi Check code in IPv6 version of bgp_create_update(). It depends on how sender get the routes (local or received, were they received alredy with link-local next hop, were the next hop modified) and whether it is IBGP or EBGP and whether next hop is the same as sender.
Also, does the passing of two next hop addresses in setup #1 fully explain why the ECMP routes programmed into the kernel use link-local gateway addresses?
Yes, link-local next hop is preferered as direct gateway.
Also, are the routes with global next hops more correct in some sense than those with link-local next hops; or vice versa? Would you expect them both to forward data correctly?
Well, it is a bit strange quirk of IPv6 BGP. In general, both global and link-local next hops should be sent when sender, receiver and global next hop are on the same subnet. Global next hop is used for recursive next hop evalulation, while link-local is used for forwarding. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Thu, Feb 6, 2020 at 3:22 PM Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Thu, Feb 06, 2020 at 12:34:00PM +0000, Neil Jerram wrote:
Good morning all!
I'm debugging a situation where I'm seeing different IPv6 next hop behaviour in two setups with different versions of my team's software.
In both setups: There are 3 routers A, B and C, all peered with another router X. They are all on the same L2 bridge, and have global IPv6 addresses in the 2001:20::/64 subnet. A, B and C all export a route for fd00:10:96::/112 ... Any ideas? Can you advise where I should look or check next, to try to understand why the UPDATE message has two next hop addresses in one setup, but only one in the other?
Hi
Check code in IPv6 version of bgp_create_update(). It depends on how sender get the routes (local or received, were they received alredy with link-local next hop, were the next hop modified) and whether it is IBGP or EBGP and whether next hop is the same as sender.
Also, does the passing of two next hop addresses in setup #1 fully explain why the ECMP routes programmed into the kernel use link-local gateway addresses?
Yes, link-local next hop is preferered as direct gateway.
Also, are the routes with global next hops more correct in some sense than those with link-local next hops; or vice versa? Would you expect them both to forward data correctly?
Well, it is a bit strange quirk of IPv6 BGP. In general, both global and link-local next hops should be sent when sender, receiver and global next hop are on the same subnet. Global next hop is used for recursive next hop evalulation, while link-local is used for forwarding.
Thank you very much Ondrej for all this. I will work through understanding and checking the details that you have provided. Best wishes, Neil
On Fri, 7 Feb 2020, 11:28 Neil Jerram, <neil@tigera.io> wrote:
On Thu, Feb 6, 2020 at 3:22 PM Ondrej Zajicek <santiago@crfreenet.org> wrote:
On Thu, Feb 06, 2020 at 12:34:00PM +0000, Neil Jerram wrote:
Good morning all!
I'm debugging a situation where I'm seeing different IPv6 next hop behaviour in two setups with different versions of my team's software.
In both setups: There are 3 routers A, B and C, all peered with another router X. They are all on the same L2 bridge, and have global IPv6 addresses in the 2001:20::/64 subnet. A, B and C all export a route for fd00:10:96::/112 ... Any ideas? Can you advise where I should look or check next, to try to understand why the UPDATE message has two next hop addresses in one setup, but only one in the other?
Hi
Check code in IPv6 version of bgp_create_update(). It depends on how sender get the routes (local or received, were they received alredy with link-local next hop, were the next hop modified) and whether it is IBGP or EBGP and whether next hop is the same as sender.
Also, does the passing of two next hop addresses in setup #1 fully explain why the ECMP routes programmed into the kernel use link-local gateway addresses?
Yes, link-local next hop is preferered as direct gateway.
Also, are the routes with global next hops more correct in some sense than those with link-local next hops; or vice versa? Would you expect them both to forward data correctly?
Well, it is a bit strange quirk of IPv6 BGP. In general, both global and link-local next hops should be sent when sender, receiver and global next hop are on the same subnet. Global next hop is used for recursive next hop evalulation, while link-local is used for forwarding.
Thank you very much Ondrej for all this. I will work through understanding and checking the details that you have provided.
Best wishes, Neil
Thanks again Ondrej, I found the root cause here, with your help. In both of my setups the peers were in fact directly connected, but one of the setups was configuring with "direct;" and the other setup with "multihop;". Best wishes, Neil
participants (2)
-
Neil Jerram -
Ondrej Zajicek