bird6: "Netlink: No such process" error from kernel proto on OSPF multipath prefixes
Hello, I am seeing periodic "Netlink: No such process" messages from bird6, apparently related to prefixes learned from OSPF where there are two equal cost paths. I've got ECMP on, and am installing routes to the kernel via kernel protocol. Operating system is GNU/Linux (Debian), kernel 3.16. The problem happens when kernel protocol starts pruning table master. Apparently, it's always deciding to update these multiple-path routes, and has some failure while trying. Log excerpt: bird6: kernel1: fec0:0:0:ffff::2/128: seen bird6: kernel1: fec0:0:0:ffff::3/128: seen bird6: kernel1: ::/0: seen bird6: kernel1: Pruning table master bird6: kernel1: 2001:xxx:yyy:z1::/64: updating bird6: Netlink: No such process bird6: kernel1: 2001:xxx:yyy:z2::/64: updating bird6: Netlink: No such process ... This is what "show route" has to say about one of these routes: bird> show route 2001:xxx:yyy:z1::/64 all 2001:xxx:yyy:z1::/64 multipath [backbone 2015-07-21] * IA (150/20) [193.aa.bb.137] via fe80::21e:bff:fec1:8c4a on eth1 weight 1 via fe80::21e:bff:fec1:8c50 on eth1 weight 1 Type: OSPF-IA unicast univ OSPF.metric1: 20 OSPF.metric2: 16777215 OSPF.tag: 0x00000000 OSPF.router_id: 193.aa.bb.137 "show route" periodically displays these routes with a '!' instead of '*', indicating a synchronization error. These "No such process" messages seem to occur every 40 seconds or so. I am seeing this error both on BIRD 1.4.5 and BIRD 1.5.0. Relevant topology: Four BIRD routers, A, B, C and D. All in area 0. A+B are ABR for area 194, and C+D are ABR for area 165. A and B announce IA prefixes z1, z2, z3, z4, z5, z6, z7 and z8 with equal cost. C and D announce IA prefixes z9 and z10 with equal cost. A and B have the error on prefixes from C and D, and vice-versa. Routes seem to disappear from kernel periodically, and are reinstalled again (monitoring with "ip -6 r"). Other routes, non-multipath from other routers, do not seem affected. Relevant config, from routers C and D: # common to all routers protocol kernel { learn; persist; scan time 20; export all; } protocol ospf backbone { tick 1; ecmp yes; area 0.0.0.0 { stub no; interface "eth1" { check link yes; }; }; area 0.0.165.0 { stub yes; summary yes; interface "eth0.2000" { type ptp; check link yes; }; interface "eth0.141", "eth0.165", "eth0.1411" { stub; check link yes; }; networks { 2001:xxx:yyy:z9::/64; 2001:xxx:yyy:z10::/64; }; }; } Routers A and B are similar but different prefixes of course. I do not see this problem with IPv4 bird (also OSPF, similar configuration). Could this be some bug with kernel protocol and multipath routes? I am available for further explanations or more details (logs, configs). Best regards, -- Israel G. Lugo Núcleo de Redes e Comunicações Direção de Serviços de Informática Instituto Superior Técnico
23 июля 2015 г. 22:57 пользователь "Israel G. Lugo" < israel.lugo@tecnico.ulisboa.pt> написал:
Hello,
I am seeing periodic "Netlink: No such process" messages from bird6, apparently related to prefixes learned from OSPF where there are two equal cost paths. I've got ECMP on, and am installing routes to the kernel via kernel protocol.
Operating system is GNU/Linux (Debian), kernel 3.16.
The problem happens when kernel protocol starts pruning table master. Apparently, it's always deciding to update these multiple-path routes, and has some failure while trying. Log excerpt:
bird6: kernel1: fec0:0:0:ffff::2/128: seen bird6: kernel1: fec0:0:0:ffff::3/128: seen bird6: kernel1: ::/0: seen bird6: kernel1: Pruning table master bird6: kernel1: 2001:xxx:yyy:z1::/64: updating bird6: Netlink: No such process bird6: kernel1: 2001:xxx:yyy:z2::/64: updating bird6: Netlink: No such process ...
This is what "show route" has to say about one of these routes:
bird> show route 2001:xxx:yyy:z1::/64 all 2001:xxx:yyy:z1::/64 multipath [backbone 2015-07-21] * IA (150/20) [193.aa.bb.137] via fe80::21e:bff:fec1:8c4a on eth1 weight 1 via fe80::21e:bff:fec1:8c50 on eth1 weight 1 Type: OSPF-IA unicast univ OSPF.metric1: 20 OSPF.metric2: 16777215 OSPF.tag: 0x00000000 OSPF.router_id: 193.aa.bb.137
"show route" periodically displays these routes with a '!' instead of '*', indicating a synchronization error.
These "No such process" messages seem to occur every 40 seconds or so.
I am seeing this error both on BIRD 1.4.5 and BIRD 1.5.0.
Relevant topology: Four BIRD routers, A, B, C and D. All in area 0. A+B are ABR for area 194, and C+D are ABR for area 165.
A and B announce IA prefixes z1, z2, z3, z4, z5, z6, z7 and z8 with equal cost. C and D announce IA prefixes z9 and z10 with equal cost. A and B have the error on prefixes from C and D, and vice-versa.
Routes seem to disappear from kernel periodically, and are reinstalled again (monitoring with "ip -6 r").
Other routes, non-multipath from other routers, do not seem affected.
Relevant config, from routers C and D:
# common to all routers protocol kernel { learn; persist; scan time 20; export all; }
protocol ospf backbone { tick 1; ecmp yes; area 0.0.0.0 { stub no; interface "eth1" { check link yes; }; }; area 0.0.165.0 { stub yes; summary yes; interface "eth0.2000" { type ptp; check link yes; }; interface "eth0.141", "eth0.165", "eth0.1411" { stub; check link yes; }; networks { 2001:xxx:yyy:z9::/64; 2001:xxx:yyy:z10::/64; }; }; }
Routers A and B are similar but different prefixes of course.
I do not see this problem with IPv4 bird (also OSPF, similar configuration). Could this be some bug with kernel protocol and multipath routes?
I am available for further explanations or more details (logs, configs).
Best regards,
Bird6 does not support ecmp in ipv6. It install and delete routes again on every sync. We hope that sometimes developers fix this issue.
-- Israel G. Lugo Núcleo de Redes e Comunicações Direção de Serviços de Informática Instituto Superior Técnico
On 07/24/2015 06:30 AM, Vasiliy Tolstov wrote:
Bird6 does not support ecmp in ipv6. It install and delete routes again on every sync. We hope that sometimes developers fix this issue.
Oh, that is very unfortunate. I have a high IPv6 usage, as all our internal VLANs are dual stack, and all public services are fully IPv6-ready. Question for the BIRD developers: would you say this functionality is hard to implement? Do you have plans for it in the near future? I am a developer myself; I could try to help with a Linux implementation, if you don't mind giving me a few general pointers into the code. Regards, -- Israel G. Lugo Núcleo de Redes e Comunicações Direcção de Serviços de Informática Instituto Superior Técnico
On Mon, Jul 27, 2015 at 12:03:13AM +0100, Israel G. Lugo wrote:
On 07/24/2015 06:30 AM, Vasiliy Tolstov wrote:
Bird6 does not support ecmp in ipv6. It install and delete routes again on every sync. We hope that sometimes developers fix this issue.
Oh, that is very unfortunate. I have a high IPv6 usage, as all our internal VLANs are dual stack, and all public services are fully IPv6-ready.
Question for the BIRD developers: would you say this functionality is hard to implement? Do you have plans for it in the near future? I am a developer myself; I could try to help with a Linux implementation, if you don't mind giving me a few general pointers into the code.
Hi The problem here is that Linux IPv6 multipath API is different from the IPv4 multipath API and very inconvenient for our purposes. AFAIK people from Cumulus Networks are trying to fix this issue directly in Linux kernel to have more reasonable API, which would probably make BIRD work with some minimal fixes. For more details, see slides 15+ in: https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_sl... -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Hello, On 27-07-2015 20:57, Ondrej Zajicek wrote:
The problem here is that Linux IPv6 multipath API is different from the IPv4 multipath API and very inconvenient for our purposes.
AFAIK people from Cumulus Networks are trying to fix this issue directly in Linux kernel to have more reasonable API, which would probably make BIRD work with some minimal fixes.
For more details, see slides 15+ in: https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_sl...
I see. Indeed there can be some nuisances, especially in cases with multiple concurrent insertions/deletions, where there might apparently be duplicate netlink messages. I found a paper by the same author on this subject, which goes into a bit more detail on the suggested solutions: http://people.netfilter.org/pablo/netdev0.1/papers/The-case-for-eliminating-... Do you happen to know what is the status of this proposal in the Linux kernel? I would imagine that it will take a while for these changes to be available in production, if they are accepted at all. May I suggest that it might be worthwhile the effort to attempt tackling this problem in user space, at least for now? The multipath feature does help a lot, and as it is now we can't really use it (routes are constantly being added and deleted). As I understand the problem, this would require some work of keeping the internal route table in sync with the kernel notifications: detecting when a notification relates to an already existing prefix and add to to the internal table as a multihop route, etc. I am not familiar with the BIRD codebase other than through a cursory glance, but I could try to think about it a bit. Of course, it would be much easier with your guidance. I haven't seen how others implemented this, for example the Quagga people, but they must have had to deal with it somehow. Regards, -- Israel G. Lugo Núcleo de Redes e Comunicações Direção de Serviços de Informática Instituto Superior Técnico
Hi list and Ondrej, as stated earlier, we also would very much appreciate the implementation of ecmp for bird6. Unfortunately we cannot offer to take part in efforts to enhance bird's capabilities due to a lack of resources. But we are also waiting impatiently for the availability of ecmp for IPv6. Best regards Gerold Am 28.07.2015 um 15:38 schrieb Israel G. Lugo:
Hello,
On 27-07-2015 20:57, Ondrej Zajicek wrote:
The problem here is that Linux IPv6 multipath API is different from the IPv4 multipath API and very inconvenient for our purposes.
AFAIK people from Cumulus Networks are trying to fix this issue directly in Linux kernel to have more reasonable API, which would probably make BIRD work with some minimal fixes.
For more details, see slides 15+ in: https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_sl...
I see. Indeed there can be some nuisances, especially in cases with multiple concurrent insertions/deletions, where there might apparently be duplicate netlink messages.
I found a paper by the same author on this subject, which goes into a bit more detail on the suggested solutions:
http://people.netfilter.org/pablo/netdev0.1/papers/The-case-for-eliminating-...
Do you happen to know what is the status of this proposal in the Linux kernel? I would imagine that it will take a while for these changes to be available in production, if they are accepted at all.
May I suggest that it might be worthwhile the effort to attempt tackling this problem in user space, at least for now? The multipath feature does help a lot, and as it is now we can't really use it (routes are constantly being added and deleted).
As I understand the problem, this would require some work of keeping the internal route table in sync with the kernel notifications: detecting when a notification relates to an already existing prefix and add to to the internal table as a multihop route, etc. I am not familiar with the BIRD codebase other than through a cursory glance, but I could try to think about it a bit. Of course, it would be much easier with your guidance.
I haven't seen how others implemented this, for example the Quagga people, but they must have had to deal with it somehow.
Regards,
-- Israel G. Lugo
-- Abteilung Systembetrieb Telefon: +49 2371 787 117 Fax: +49 2371 787 61117 E-Mail: gruber@citkomm.de Internet: http://www.citkomm.de Citkomm services GmbH* KDVZ Citkomm (Kommunaler Zweckverband) Griesenbraucker Str. 4 58640 Iserlohn Telefon: +49 2371 787 0 Fax: +49 2371 787 279 E-Mail: post@citkomm.de * Sitz der Gesellschaft: Iserlohn Handelsregister: AG Iserlohn, HRB 26 86 Geschäftsführer: Dr. Michael Neubauer, Hans Jürgen Friebe, Kerstin Pliquett
participants (4)
-
Gruber Gerold -
Israel G. Lugo -
Ondrej Zajicek -
Vasiliy Tolstov