Hi all, I think I looked in all the regular places and inferred that reporting issues might best be done here. The gitlab issue tracker shows no issues. The problem I have seen is where BIRD BGP is running a passive process. The initiator peer is running the product at https://metallb.universe.tf <https://metallb.universe.tf/>. This product can only initiate sessions. Due to (what I assume) is a bug in the product, it was able to initiate a session from a TCP socket source address that was not matched to the router ID. More specifically, both BGP processes were running on the same host, with MetalLB configured to take a router ID that was a secondary (internal / private) interface and and BIRD using the primary interface address (so as to be able to connect to other external peers). I believe the BIRD team would be interested in this because when BIRD BGP accepted a connection with the same source address as it’s own Router ID, even though the session provided a different router ID, it caused BIRD the BIRD RIB and kernel exports to show the source of the route to be a completely different BGP peer. In this case, the different peer was the default route of a stub area, so incoming traffic to these addresses formed a routing loop. It’s unclear to me whether this is a bug or a feature. If it is indeed a feature or unavoidable side effect that needs to be supported all the same, what would have been desirable is if there was at least some log message of the form “this is probably not what you want” when a peer connected in this manner. More information can be found at https://github.com/danderson/metallb/issues/422 <https://github.com/danderson/metallb/issues/422> and the referenced PR. I’m happy to help reproduce the problem if needed. Thanks for a great product!! best, Brian
Hi Brian, Can you give specific examples of what is happening? Configuration samples, show running route information from cli, etc. On Sat, Apr 6, 2019 at 9:19 PM Brian Topping <brian.topping@gmail.com> wrote:
Hi all, I think I looked in all the regular places and inferred that reporting issues might best be done here. The gitlab issue tracker shows no issues.
The problem I have seen is where BIRD BGP is running a passive process. The initiator peer is running the product at https://metallb.universe.tf. This product can only initiate sessions. Due to (what I assume) is a bug in the product, it was able to initiate a session from a TCP socket source address that was not matched to the router ID. More specifically, both BGP processes were running on the same host, with MetalLB configured to take a router ID that was a secondary (internal / private) interface and and BIRD using the primary interface address (so as to be able to connect to other external peers).
I believe the BIRD team would be interested in this because when BIRD BGP accepted a connection with the same source address as it’s own Router ID, even though the session provided a different router ID, it caused BIRD the BIRD RIB and kernel exports to show the source of the route to be a completely different BGP peer. In this case, the different peer was the default route of a stub area, so incoming traffic to these addresses formed a routing loop. It’s unclear to me whether this is a bug or a feature.
If it is indeed a feature or unavoidable side effect that needs to be supported all the same, what would have been desirable is if there was at least some log message of the form “this is probably not what you want” when a peer connected in this manner. More information can be found at https://github.com/danderson/metallb/issues/422 and the referenced PR.
I’m happy to help reproduce the problem if needed.
Thanks for a great product!!
best, Brian
On Apr 6, 2019, at 1:38 PM, Alexander Zubkov <green@qrator.net> wrote:
Hi Brian,
Can you give specific examples of what is happening? Configuration samples, show running route information from cli, etc.
Hi Alexander, I can give it a shot here. Apologies for the previous direct reply. These are the configs after application of the patch that I referenced below. BIRD BGP:
router id UP.STREAM.143.113;
protocol bgp bgp_metal_gw01 { local as ASLOCAL; neighbor 10.10.0.41 as ASN; # See description below passive yes; ipv4 { next hop self; import filter { bgp_origin = ORIGIN_IGP; dest = RTD_BLACKHOLE; accept; }; export none; }; }
MetalLB:
apiVersion: v1 kind: ConfigMap data: config: | peers: - peer-address: UP.STREAM.143.113 router-id: 10.10.0.41 peer-asn: ASLOCAL my-asn: ASLOCAL
Without the patch, the BIRD neighbor address is *also* UP.STREAM.143.113. This is broken, but because MetalLB was using the primary interface that was returned by Go standard library, I realized via tcpdump that BIRD wasn’t allowing the connection without the neighbor address matching the socket source address being used by MetalLB. At that point, the session was established with no warnings or errors, but the behavior I described previously was the result. Relevant trace from that previous configuration:
UP.STREAM.143.125.bgp > UP.STREAM.143.113.40871: Flags [P.], cksum 0x5bf5 (correct), seq 87:150, ack 71, win 16384, options [nop,nop,TS val 1618696452 ecr 1215190272], length 63: BGP Keepalive Message (4), length: 19 Update Message (2), length: 44 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: 30475 0x0000: 0201 0000 770b Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.125 0x0000: adf8 8f7d Updated routes: 0.0.0.0/0 00:02:53.744295 IP (tos 0xc0, ttl 1, id 38102, offset 0, flags [DF], proto TCP (6), length 210) UP.STREAM.143.113.40871 > UP.STREAM.143.125.bgp: Flags [P.], cksum 0x7ba4 (incorrect -> 0x443f), seq 71:229, ack 150, win 502, options [nop,nop,TS val 1215190314 ecr 1618696452], length 158: BGP Update Message (2), length: 84 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 10, Flags [T]: ASLOCAL ASLOCAL 0x0000: 0202 0000 2963 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Community (8), length: 4, Flags [OTP]: NO_EXPORT 0x0000: ffff ff01 Updated routes: ANN.CIDR.96.10/32 ANN.CIDR.96.11/32 ANN.CIDR.96.8/32 ANN.CIDR.96.9/32 ANN.CIDR.96.0/32 ANN.CIDR.97.1/32 Update Message (2), length: 51 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: ASLOCAL 0x0000: 0201 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Updated routes: ANN.CIDR.96.0/24 ANN.CIDR.97.0/24 Update Message (2), length: 23 End-of-Rib Marker (empty NLRI)
With the patch, I use the configuration as shown and it peers as before, but the RIB is properly formed:
[root@gw01 ~]# birdc show route BIRD 2.0.2 ready. Table master4: 0.0.0.0/0 unicast [bgp_handy_125 21:22:05.177] * (100) [ASUPSTREAMi] via UP.STREAM.143.125 on eno1 unicast [bgp_handy_126 21:22:04.771] (100) [ASUPSTREAMi] via UP.STREAM.143.126 on eno1 ANN.CIDR.96.10/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.9.255.0/24 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev vti19 ANN.CIDR.96.11/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.8/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.9/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.97.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.96.0/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.97.1/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.10.0.0/22 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev eno2
Does that help?
Hi Brian, It is a bit more information, but it does not help fully still. You are saying:
With the patch, I use the configuration as shown and it peers as before, but the RIB is properly formed:
But I can not understand what error did you have before. If you showed original bgp configuration, tcpdump and result of commands like "show protocols all", "show route all" and pointed out what pieces you find wrong, it would be helpful. It also seems to me that in your first message you could mix up router ids and neigbours' ips. On Sun, Apr 7, 2019 at 12:23 AM Brian Topping <brian.topping@gmail.com> wrote:
On Apr 6, 2019, at 1:38 PM, Alexander Zubkov <green@qrator.net> wrote:
Hi Brian,
Can you give specific examples of what is happening? Configuration samples, show running route information from cli, etc.
Hi Alexander, I can give it a shot here. Apologies for the previous direct reply.
These are the configs after application of the patch that I referenced below.
BIRD BGP:
router id UP.STREAM.143.113;
protocol bgp bgp_metal_gw01 { local as ASLOCAL; neighbor 10.10.0.41 as ASN; # See description below passive yes; ipv4 { next hop self; import filter { bgp_origin = ORIGIN_IGP; dest = RTD_BLACKHOLE; accept; }; export none; }; }
MetalLB:
apiVersion: v1 kind: ConfigMap data: config: | peers: - peer-address: UP.STREAM.143.113 router-id: 10.10.0.41 peer-asn: ASLOCAL my-asn: ASLOCAL
Without the patch, the BIRD neighbor address is *also* UP.STREAM.143.113. This is broken, but because MetalLB was using the primary interface that was returned by Go standard library, I realized via tcpdump that BIRD wasn’t allowing the connection without the neighbor address matching the socket source address being used by MetalLB. At that point, the session was established with no warnings or errors, but the behavior I described previously was the result. Relevant trace from that previous configuration:
UP.STREAM.143.125.bgp > UP.STREAM.143.113.40871: Flags [P.], cksum 0x5bf5 (correct), seq 87:150, ack 71, win 16384, options [nop,nop,TS val 1618696452 ecr 1215190272], length 63: BGP Keepalive Message (4), length: 19 Update Message (2), length: 44 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: 30475 0x0000: 0201 0000 770b Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.125 0x0000: adf8 8f7d Updated routes: 0.0.0.0/0 00:02:53.744295 IP (tos 0xc0, ttl 1, id 38102, offset 0, flags [DF], proto TCP (6), length 210) UP.STREAM.143.113.40871 > UP.STREAM.143.125.bgp: Flags [P.], cksum 0x7ba4 (incorrect -> 0x443f), seq 71:229, ack 150, win 502, options [nop,nop,TS val 1215190314 ecr 1618696452], length 158: BGP Update Message (2), length: 84 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 10, Flags [T]: ASLOCAL ASLOCAL 0x0000: 0202 0000 2963 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Community (8), length: 4, Flags [OTP]: NO_EXPORT 0x0000: ffff ff01 Updated routes: ANN.CIDR.96.10/32 ANN.CIDR.96.11/32 ANN.CIDR.96.8/32 ANN.CIDR.96.9/32 ANN.CIDR.96.0/32 ANN.CIDR.97.1/32 Update Message (2), length: 51 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: ASLOCAL 0x0000: 0201 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Updated routes: ANN.CIDR.96.0/24 ANN.CIDR.97.0/24 Update Message (2), length: 23 End-of-Rib Marker (empty NLRI)
With the patch, I use the configuration as shown and it peers as before, but the RIB is properly formed:
[root@gw01 ~]# birdc show route BIRD 2.0.2 ready. Table master4: 0.0.0.0/0 unicast [bgp_handy_125 21:22:05.177] * (100) [ASUPSTREAMi] via UP.STREAM.143.125 on eno1 unicast [bgp_handy_126 21:22:04.771] (100) [ASUPSTREAMi] via UP.STREAM.143.126 on eno1 ANN.CIDR.96.10/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.9.255.0/24 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev vti19 ANN.CIDR.96.11/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.8/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.9/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.97.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.96.0/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.97.1/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.10.0.0/22 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev eno2
Does that help?
Hi Alexander, The “error” I had before was the lack of a warning. I think the usability of BIRD would be improved if there was a warning for the state that I described. Thanks, Brian
On Apr 10, 2019, at 6:17 AM, Alexander Zubkov <green@qrator.net> wrote:
Hi Brian,
It is a bit more information, but it does not help fully still. You are saying:
With the patch, I use the configuration as shown and it peers as before, but the RIB is properly formed:
But I can not understand what error did you have before. If you showed original bgp configuration, tcpdump and result of commands like "show protocols all", "show route all" and pointed out what pieces you find wrong, it would be helpful. It also seems to me that in your first message you could mix up router ids and neigbours' ips.
On Sun, Apr 7, 2019 at 12:23 AM Brian Topping <brian.topping@gmail.com> wrote:
On Apr 6, 2019, at 1:38 PM, Alexander Zubkov <green@qrator.net> wrote:
Hi Brian,
Can you give specific examples of what is happening? Configuration samples, show running route information from cli, etc.
Hi Alexander, I can give it a shot here. Apologies for the previous direct reply.
These are the configs after application of the patch that I referenced below.
BIRD BGP:
router id UP.STREAM.143.113;
protocol bgp bgp_metal_gw01 { local as ASLOCAL; neighbor 10.10.0.41 as ASN; # See description below passive yes; ipv4 { next hop self; import filter { bgp_origin = ORIGIN_IGP; dest = RTD_BLACKHOLE; accept; }; export none; }; }
MetalLB:
apiVersion: v1 kind: ConfigMap data: config: | peers: - peer-address: UP.STREAM.143.113 router-id: 10.10.0.41 peer-asn: ASLOCAL my-asn: ASLOCAL
Without the patch, the BIRD neighbor address is *also* UP.STREAM.143.113. This is broken, but because MetalLB was using the primary interface that was returned by Go standard library, I realized via tcpdump that BIRD wasn’t allowing the connection without the neighbor address matching the socket source address being used by MetalLB. At that point, the session was established with no warnings or errors, but the behavior I described previously was the result. Relevant trace from that previous configuration:
UP.STREAM.143.125.bgp > UP.STREAM.143.113.40871: Flags [P.], cksum 0x5bf5 (correct), seq 87:150, ack 71, win 16384, options [nop,nop,TS val 1618696452 ecr 1215190272], length 63: BGP Keepalive Message (4), length: 19 Update Message (2), length: 44 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: 30475 0x0000: 0201 0000 770b Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.125 0x0000: adf8 8f7d Updated routes: 0.0.0.0/0 00:02:53.744295 IP (tos 0xc0, ttl 1, id 38102, offset 0, flags [DF], proto TCP (6), length 210) UP.STREAM.143.113.40871 > UP.STREAM.143.125.bgp: Flags [P.], cksum 0x7ba4 (incorrect -> 0x443f), seq 71:229, ack 150, win 502, options [nop,nop,TS val 1215190314 ecr 1618696452], length 158: BGP Update Message (2), length: 84 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 10, Flags [T]: ASLOCAL ASLOCAL 0x0000: 0202 0000 2963 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Community (8), length: 4, Flags [OTP]: NO_EXPORT 0x0000: ffff ff01 Updated routes: ANN.CIDR.96.10/32 ANN.CIDR.96.11/32 ANN.CIDR.96.8/32 ANN.CIDR.96.9/32 ANN.CIDR.96.0/32 ANN.CIDR.97.1/32 Update Message (2), length: 51 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: ASLOCAL 0x0000: 0201 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Updated routes: ANN.CIDR.96.0/24 ANN.CIDR.97.0/24 Update Message (2), length: 23 End-of-Rib Marker (empty NLRI)
With the patch, I use the configuration as shown and it peers as before, but the RIB is properly formed:
[root@gw01 ~]# birdc show route BIRD 2.0.2 ready. Table master4: 0.0.0.0/0 unicast [bgp_handy_125 21:22:05.177] * (100) [ASUPSTREAMi] via UP.STREAM.143.125 on eno1 unicast [bgp_handy_126 21:22:04.771] (100) [ASUPSTREAMi] via UP.STREAM.143.126 on eno1 ANN.CIDR.96.10/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.9.255.0/24 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev vti19 ANN.CIDR.96.11/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.8/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.9/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.97.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.96.0/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.97.1/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.10.0.0/22 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev eno2
Does that help?
Hi Brian, But I still have not understood what state you consider an error and where do you expect a warning? On Wed, Apr 10, 2019 at 7:56 PM Brian Topping <brian.topping@gmail.com> wrote:
Hi Alexander,
The “error” I had before was the lack of a warning. I think the usability of BIRD would be improved if there was a warning for the state that I described.
Thanks, Brian
On Apr 10, 2019, at 6:17 AM, Alexander Zubkov <green@qrator.net> wrote:
Hi Brian,
It is a bit more information, but it does not help fully still. You are saying:
With the patch, I use the configuration as shown and it peers as before, but the RIB is properly formed:
But I can not understand what error did you have before. If you showed original bgp configuration, tcpdump and result of commands like "show protocols all", "show route all" and pointed out what pieces you find wrong, it would be helpful. It also seems to me that in your first message you could mix up router ids and neigbours' ips.
On Sun, Apr 7, 2019 at 12:23 AM Brian Topping <brian.topping@gmail.com> wrote:
On Apr 6, 2019, at 1:38 PM, Alexander Zubkov <green@qrator.net> wrote:
Hi Brian,
Can you give specific examples of what is happening? Configuration samples, show running route information from cli, etc.
Hi Alexander, I can give it a shot here. Apologies for the previous direct reply.
These are the configs after application of the patch that I referenced below.
BIRD BGP:
router id UP.STREAM.143.113;
protocol bgp bgp_metal_gw01 { local as ASLOCAL; neighbor 10.10.0.41 as ASN; # See description below passive yes; ipv4 { next hop self; import filter { bgp_origin = ORIGIN_IGP; dest = RTD_BLACKHOLE; accept; }; export none; }; }
MetalLB:
apiVersion: v1 kind: ConfigMap data: config: | peers: - peer-address: UP.STREAM.143.113 router-id: 10.10.0.41 peer-asn: ASLOCAL my-asn: ASLOCAL
Without the patch, the BIRD neighbor address is *also* UP.STREAM.143.113. This is broken, but because MetalLB was using the primary interface that was returned by Go standard library, I realized via tcpdump that BIRD wasn’t allowing the connection without the neighbor address matching the socket source address being used by MetalLB. At that point, the session was established with no warnings or errors, but the behavior I described previously was the result. Relevant trace from that previous configuration:
UP.STREAM.143.125.bgp > UP.STREAM.143.113.40871: Flags [P.], cksum 0x5bf5 (correct), seq 87:150, ack 71, win 16384, options [nop,nop,TS val 1618696452 ecr 1215190272], length 63: BGP Keepalive Message (4), length: 19 Update Message (2), length: 44 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: 30475 0x0000: 0201 0000 770b Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.125 0x0000: adf8 8f7d Updated routes: 0.0.0.0/0 00:02:53.744295 IP (tos 0xc0, ttl 1, id 38102, offset 0, flags [DF], proto TCP (6), length 210) UP.STREAM.143.113.40871 > UP.STREAM.143.125.bgp: Flags [P.], cksum 0x7ba4 (incorrect -> 0x443f), seq 71:229, ack 150, win 502, options [nop,nop,TS val 1215190314 ecr 1618696452], length 158: BGP Update Message (2), length: 84 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 10, Flags [T]: ASLOCAL ASLOCAL 0x0000: 0202 0000 2963 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Community (8), length: 4, Flags [OTP]: NO_EXPORT 0x0000: ffff ff01 Updated routes: ANN.CIDR.96.10/32 ANN.CIDR.96.11/32 ANN.CIDR.96.8/32 ANN.CIDR.96.9/32 ANN.CIDR.96.0/32 ANN.CIDR.97.1/32 Update Message (2), length: 51 Origin (1), length: 1, Flags [T]: IGP 0x0000: 00 AS Path (2), length: 6, Flags [T]: ASLOCAL 0x0000: 0201 0000 2963 Next Hop (3), length: 4, Flags [T]: UP.STREAM.143.113 0x0000: adf8 8f71 Updated routes: ANN.CIDR.96.0/24 ANN.CIDR.97.0/24 Update Message (2), length: 23 End-of-Rib Marker (empty NLRI)
With the patch, I use the configuration as shown and it peers as before, but the RIB is properly formed:
[root@gw01 ~]# birdc show route BIRD 2.0.2 ready. Table master4: 0.0.0.0/0 unicast [bgp_handy_125 21:22:05.177] * (100) [ASUPSTREAMi] via UP.STREAM.143.125 on eno1 unicast [bgp_handy_126 21:22:04.771] (100) [ASUPSTREAMi] via UP.STREAM.143.126 on eno1 ANN.CIDR.96.10/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.9.255.0/24 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev vti19 ANN.CIDR.96.11/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.8/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.9/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.96.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.97.0/24 blackhole [public_nets_proto 21:22:00.809] * (500) ANN.CIDR.96.0/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] ANN.CIDR.97.1/32 blackhole [bgp_metal_gw01 16:01:32.784 from 10.10.0.41] * (100) [i] 10.10.0.0/22 unicast [backbone 21:22:00.909] * I (150/10) [UP.STREAM.143.113] dev eno2
Does that help?
On Sat, Apr 06, 2019 at 01:11:32PM -0600, Brian Topping wrote:
Hi all, I think I looked in all the regular places and inferred that reporting issues might best be done here. The gitlab issue tracker shows no issues.
Hi Yes, this is proper place for reporting issues.
The problem I have seen is where BIRD BGP is running a passive process. The initiator peer is running the product at https://metallb.universe.tf <https://metallb.universe.tf/>. This product can only initiate sessions. Due to (what I assume) is a bug in the product, it was able to initiate a session from a TCP socket source address that was not matched to the router ID. More specifically, both BGP processes were running on the same host, with MetalLB configured to take a router ID that was a secondary (internal / private) interface and and BIRD using the primary interface address (so as to be able to connect to other external peers).
I believe the BIRD team would be interested in this because when BIRD BGP accepted a connection with the same source address as it’s own Router ID, even though the session provided a different router ID, it caused BIRD the BIRD RIB and kernel exports to show the source of the route to be a completely different BGP peer. In this case, the different peer was the default route of a stub area, so incoming traffic to these addresses formed a routing loop. It’s unclear to me whether this is a bug or a feature.
If it is indeed a feature or unavoidable side effect that needs to be supported all the same, what would have been desirable is if there was at least some log message of the form “this is probably not what you want” when a peer connected in this manner. More information can be found at https://github.com/danderson/metallb/issues/422 <https://github.com/danderson/metallb/issues/422> and the referenced PR.
Like Alexander Zubkov already noted, i also does not see any issue. Mainly, the router ID and IP addresses are in general unrelated (although BIRD defaults to using router ID based of one of its IP addresses when router ID is not explicitly configured), so it is perfectly valid to have session IP address different than router ID.
Thanks for a great product!!
Thanks for the feedback. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Apr 12, 2019, at 4:45 AM, Ondrej Zajicek <santiago@crfreenet.org> wrote:
Like Alexander Zubkov already noted, i also does not see any issue.
Thanks Ondrej, welcome back :-) Hope you had a great break. I accidentally hit “reply” instead of “reply all” and part of our conversation went private. Having been on the receiving end of bug reports that are hard to articulate, it seems that the best way to capture the issue is with PRs that are easily identified as not changing any logic… ie they just print warnings. Team members can back out from there why a message would be worthwhile in certain situations, or possibly have a much better grasp of how "the problem is between keyboard and chair”. :) In any event, I don’t even know if I have sufficient logs any more to explain the problem to myself, much less anyone else.
Mainly, the router ID and IP addresses are in general unrelated (although BIRD defaults to using router ID based of one of its IP addresses when router ID is not explicitly configured), so it is perfectly valid to have session IP address different than router ID.
That’s good to know, thanks. I would have assumed that, but assumptions always lead to trouble. I’d like to become more productive with the source, I’ll address that in a separate email. Thanks! Brian
participants (3)
-
Alexander Zubkov -
Brian Topping -
Ondrej Zajicek