<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Hoi,<br>
<br>
<div class="moz-cite-prefix">On 20.02.2026 01:31, Ondrej Zajicek
wrote:<span style="white-space: pre-wrap">
</span></div>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<pre wrap="" class="moz-quote-pre">That is the fake interface from if_get_by_name(). Using them in route
nexthops is 'fine' on the level that it does not crash due to NULL
dereference, but they were never supposed be used this way, they are
just placeholders for configuration.
Note that these fake interfaces are horrible hack in BIRD code, as
properly there should be two distinct structures: iface_config and
iface, the former representing interface referenced in config file, and
the latter representing real kernel interfaces found by 'device' protocol.
But we use the same structure for both cases.</pre>
</blockquote>
Understood - once iface_config and iface are split, I can make use
of either construct (the iface_config one makes more sense). Neither
the interface name or kernel device are necessary in my
implementation.<span style="white-space: pre-wrap">
</span><span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<pre wrap="" class="moz-quote-pre">I wonder if your setup would work, if you instead of using this fake interface
use some real placeholder interface, say loopback:
'encapsulation vxlan { tunnel device "lo"; };'</pre>
</blockquote>
It works fine. As an aside, reconfiguring causes a restart of evpn
protocol, which trips an assertion and crashes. The crash also
happens on 'birdc disable evpn1'.<br>
Feb 20 12:12:29 vpp0-3 bird[1455113]: Restarting protocol evpn1
<br>
Feb 20 12:12:29 vpp0-3 bird[1455113]: Assertion 'pub->queue
&& pub->topic' failed at lib/pubsub.c:161<br>
Feb 20 12:12:29 vpp0-3 systemd[1]: bird-dataplane.service: Main
process exited, code=killed, status=11/SEGV <br>
<br>
Either way, Bird comes back up and works just fine using tunnel_dev
set to "lo". It reminds me that I already use this trick, as MAC
addresses learned from VPP's bridge-domain do not have any
corresponding Linux or Bird interface, so I inject them into etab
using "lo" as well.<br>
<br>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<pre wrap="" class="moz-quote-pre">The 'cheat' have to be modified (it should wait for the interface,
but will ignore the fact that the interface is not a tunnel (i.e.
skip/ignore evpn_validate_iface_attrs()).</pre>
</blockquote>
I like that. Perhaps a keyword in the config can signal that this is
OK, like 'tunnel device "evpn0-dummy" virtual;' or just 'tunnel
device "lo" virtual;'<br>
<br>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<pre wrap="" class="moz-quote-pre">Note that you should read IMET from etab too. EVPN protocol translate
all IMETs from evpntab to etab, otherwise even our kernel-based setup
would not work -- 'bridge' protocol that configures kernel bridge also
reads just etab.</pre>
</blockquote>
I do not have multiple IMETs in etab, only one:<br>
root@vpp0-0:/etc/bird# birdc show route table evpntab | grep imet<br>
evpn imet 8298:100 0 2001:678:d78:200::3 [vpp0_3 12:12:38.484 from
2001:678:d78:200::3] * (100) [i]<br>
evpn imet 8298:100 0 2001:678:d78:200::2 [vpp0_2 11:18:21.821 from
2001:678:d78:200::2] * (100) [i]<br>
evpn imet 8298:100 0 2001:678:d78:200::1 [vpp0_1 11:18:21.253 from
2001:678:d78:200::1] * (100) [i]<br>
evpn imet 8298:100 0 2001:678:d78:200:: unicast [evpn1 11:18:07.285]
* (120)<br>
<br>
root@vpp0-0:/etc/bird# birdc show route table etab | grep 00:00:00:<br>
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 12:12:38.484] *
(80)<br>
<br>
Perhaps I'm holding it wrong (see bird-example.conf). It would
actually be super if I could rely <i>only</i> on etab, as tracking
both etab and evpntab was a fair amount of extra code.<span
style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<pre wrap="" class="moz-quote-pre">I agree and this split of work between 'evpn' protocol and 'bridge' protocol
(with separate 'evpn table' and 'eth table') are going to stay.</pre>
</blockquote>
Thank you! That's great news for me.<br>
<br>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">I am happy to share the 'vppevpn' protocol with others also, as an example
'3P integration'. I do not expect it to be upstreamed into Bird2, unless
there are community requests for it.
Ondrej, do let me know if you'd like to take a sneak peak at my code (it's
in a private repo for now, as it's not ready for wider review yet, but it is
mostly functional).
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
Having better integration with VPP (or some other userspace dataplane)
is something we are interested in general, but i would not look at it
before i finish some other tasks (including merging EVPN) as i am rather
overwhelmed.</pre>
</blockquote>
I can volunteer my time to write a vpp protocol (for ip4, ip6, mpls
FIB and interfaces). I'll contact you separately for that, it sounds
like a worthwhile project and I've kind of always wanted to do it.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">(3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
When the BGP Next Hop is changed by an export filter, we lose the MPLS
labelstack. There is no way to add MPLS labelstack in filters (at least,
that I could find), so we cannot use 'next hop address X' to determine the
Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
but rather a PSMI attribute with the 'router address' already.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">Resetting MPLS label when changing next hop is intentional, as MPLS labels are
(in general) specific to receiving routers.
There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
attribute that could be accessed in filters.
I am not sure what is your use case here to change it with filters, can
you describe it more? What about setting 'router address' in EVPN proto?
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
1) copy that to the PSMI attribute: good
2) not do anything for MAC announcements; they will have BGP.next_hop set to
the session address.
if the previous patch in (2) is accepted, then 'router address' will be used
as BGP.next_hop, which will avoid the need to change it with filters with
(3).
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">Oh, i see. You are right, this should work automatically for both IMET / PMSI
and MAC.
I do not like using regular/immediate next hops here in EVPN table, as
it does not fit well semantically and requires formal device. But seems
to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
Any comments?
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni
to the etab table entry, I would simply read that and use it instead of the
bgp nexthop. That's what happens already today for IMET, as it has the
BGP.pmsi_tunnel attribute with the needed ingress-replication
2001:678:d78:200::2 mpls 10040 information. How do other vendors (say
Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding
is they use BGP next hop for that (in other words, the same as how Bird does
it today).
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
I think there is some confusion here. I am talking about evpntab
entries, not about etab entries. And about your patch that sets router
IP into their immediate next hop (nh.gw).</pre>
</blockquote>
I see - then maybe I can try a different approach. The patch, I
thought, makes Bird behave the same as Nokia SRLinux {1], which also
sets the router ip (the local VTEP) as nexthop but what you're
saying is I should not set the <i>immediate</i> nexthop, but rather
leave that alone and set the <i>BGP Next Hop</i>? Although as a
reminder, I need to be able to set an IPv4 BGP Next Hop on an IPv6
session only for some RTs, not all. See one more thought on that
below ..<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">Not sure I understand what you mean - don't we have this problem also for
kernel based vxlan? If we create a vxlan0 interface in a bridge, and set a
fdb entry onto it, we also need to know which vxlan nexthop to use. The way
I read 'evpn' and 'oz-evpn', we use the BGP nexthop for that purpose.
However, if what you're saying is you'd want to remove the BGP Next Hop and
instead have an EVPN VxLAN Next Hop attribute to populate the 'etab' gateway
field that would work just as well for me. I kind of wonder why you'd go to
the trouble obfuscating the BGP Next Hop. Don't other vendors use the same
thing (send vxlan packet to the address learned via the BGP Next Hop in
Type-2 announcements) ?
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
I just mean that immediate next hop fields for evpntab routes received
through BGP are irrelevant, while the BGP next hop attribute is the
important one. When 'evpn' protocol takes a route from evpntab and convert
it to etab entry, it examines BGP next hop, not immediate next hop.</pre>
</blockquote>
OK I think I understand now.<br>
<br>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">While i agree that it should work automatically by just setting router
address in protocol evpn, i think that this setup that should work even
without patches:
protocol evpn {
...
encapsulation vxlan { router address 192.0.2.1; };
}
protocol bgp {
evpn { import all; export all; next hop address 192.0.2.1; };
local 2001:db8::1 as 65512;
neighbor 2001:db8::2 as 65512;
}
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">I don't think this works for MAC, for IMET it works because that has a
custom PSMI BGP attribute which is set to encap0->router_addr). Setting the
next hop in this way will clear the mpls labelstack. So we'd end up with:
fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
via 192.0.2.1 on vxlan0 mpls 0
and we'd lose the VNI.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
I think it will not clear the MPLS labelstack. This is not setting next
hop in filters. The difference between
evpn { import all; export all; next hop address 192.0.2.1; };
and
evpn { import all; export all; };
in BGP protocol export is only where the BGP next hop value is taken
from (explicitly configured one or source address from BGP session), but
route processing is the same. See bgp_update_next_hop_ip(), the
!bgp_use_next_hop(s, a) and !bgp_use_gateway(s) case.</pre>
</blockquote>
I tried this, and you are correct that 'next hop address' works and
leaves the MPLS labelstack alone:<br>
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] *
(80)<br>
via 192.168.10.0 on lo mpls 20040<br>
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] *
(80)<br>
via 192.168.10.0 on lo mpls 20040<br>
<br>
Now let's suppose I have two evpn protocols, one with an IPv4 router
address and one with an IPv6 router address. In this scenario, I
can't use 'next hop address' because it'll force both to use that
address family. <br>
<br>
It yields a bad state:<br>
1) as before, the IPv4-only evpn (VNI 20040) works<br>
2) but now, the evpn with an IPv6 router address, sends IMET with
IPv6, and MAC with IPv4<br>
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] *
(80)<br>
via 2001:678:d78:200:: on lo mpls 10040<br>
Type: EVPN univ<br>
mpls_label: 10040<br>
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] *
(80)<br>
via 192.168.10.0 on lo mpls 10040<br>
Type: EVPN univ<br>
mpls_label: 10040<br>
<br>
An obvious solution is to use a filter, like this one:<br>
filter bgp_evpn_out {<br>
if (rt, 8298, 10040) ~ bgp_ext_community then { bgp_next_hop =
192.168.10.3; }<br>
if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop =
2001:678:d78:200::3; }<br>
accept;<br>
} <br>
<br>
template bgp T_BGP_EVPN {<br>
evpn { import all; export filter bgp_evpn_out; };<br>
local 2001:678:d78:200::3 as 65512;<br>
}<br>
<br>
But now the filter does destroy the MPLS labelstack, although the
mpls_label attribute remains:<br>
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] *
(80)<br>
via 2001:678:d78:200:: on lo mpls 10040<br>
Type: EVPN univ<br>
mpls_label: 10040<br>
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] *
(80)<br>
<b> via 2001:678:d78:200:: on lo mpls 0</b><br>
Type: EVPN univ<br>
mpls_label: 10040<br>
<br>
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] *
(80)<br>
via 192.168.10.0 on lo mpls 20040<br>
Type: EVPN univ<br>
mpls_label: 20040<br>
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] *
(80)<br>
<b> via 192.168.10.0 on lo mpls 0</b><br>
Type: EVPN univ<br>
mpls_label: 20040<br>
<br>
My conclusion was: I need to be able to apply filters without
destroying the MPLS labels. If I now understand correctly, I can
remove the nh.gw/nh.iface from evpn_announce_mac() and
evpn_announce_imet(), but keep the change in
bgp_update_next_hop_ip()<br>
<br>
@@ -1314,19 +1310,6 @@ bgp_update_next_hop_ip(struct
bgp_export_state *s, eattr *a, ea_list **to)<br>
}<br>
}<br>
<br>
+ /* For L2VPN (EVPN): ensure MPLS label stack is set even if next
hop was filter-overridden */<br>
+ if (s->mpls && bgp_channel_is_l2vpn(s->channel)
&& !bgp_find_attr(*to, BA_MPLS_LABEL_STACK))<br>
+ {<br>
+ rta *ra = s->route->attrs;<br>
+ if (ra->nh.labels)<br>
+ bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0,
ra->nh.label, ra->nh.labels * 4);<br>
+ else<br>
+ {<br>
+ u32 label = ea_get_int(ra->eattrs, EA_MPLS_LABEL,
BGP_MPLS_NULL);<br>
+ bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0,
&label, 4);<br>
+ }<br>
+ }<br>
<br>
This allows the above filter to work while preserving the
labelstack:<br>
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] *
(80)<br>
via 2001:678:d78:200:: on lo mpls 10040<br>
Type: EVPN univ<br>
mpls_label: 10040<br>
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] *
(80)<br>
<b>
via 2001:678:d78:200:: on lo mpls 10040</b><br>
Type: EVPN univ<br>
mpls_label: 10040<br>
<br>
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] *
(80)<br>
via 192.168.10.0 on lo mpls 20040<br>
Type: EVPN univ<br>
mpls_label: 20040<br>
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] *
(80)<br>
<b> via 192.168.10.0 on lo mpls 20040</b><br>
Type: EVPN univ<br>
mpls_label: 20040<br>
<br>
Of course, open to better solutions :)<br>
<br>
groet,<br>
Pim<br>
<br>
<span style="white-space: pre-wrap">
</span><span style="white-space: pre-wrap">
</span>[1] A:pim@asw121# show network-instance default protocols bgp
routes evpn route-type 2 detail | more<br>
Route Distinguisher: 65500:264<br>
Tag-ID : 0<br>
MAC address : 64:9D:99:D0:70:4D<br>
IP Address : 10.26.0.1<br>
neighbor : 198.19.16.0<br>
path-id : 0<br>
Received paths : 1<br>
Path 1: <Best,Valid,Used,><br>
ESI : 00:00:00:00:00:00:00:00:00:00<br>
Label : 264<br>
Route source : neighbor 198.19.16.0 (last modified
68d14h37m6s ago)<br>
Route preference : No MED, LocalPref is 100<br>
Atomic Aggr : false<br>
BGP next-hop : 198.19.18.0<br>
AS Path : i<br>
Communities : [target:65500:264, bgp-tunnel-encap:VXLAN]<br>
RR Attributes : No Originator-ID, Cluster-List is []<br>
Aggregation : None<br>
Unknown Attr : None<br>
Invalid Reason : None<br>
Tie Break Reason : none<br>
Route Flap Damping: None<br>
<blockquote type="cite" cite="mid:aZerTXzGjUYwEv4j@feanor">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre"></pre>
</blockquote>
</blockquote>
<pre class="moz-signature" cols="72">--
Pim van Pelt <a class="moz-txt-link-rfc2396E" href="mailto:pim@ipng.ch"><pim@ipng.ch></a>
PBVP1-RIPE <a class="moz-txt-link-freetext" href="https://ipng.ch/">https://ipng.ch/</a></pre>
<br>
</body>
</html>