evpn rebase to HEAD
Hoi folks, I've started to toy with VPP and eVPN/VxLAN, and took a look at the evpn branch from a few years ago. For my network, I'll need the OSPFv3 'unnumbered' features we built, so I thought I'd ask - would it be possible to rebase the evpn branch ? I've taken a stab at it (see attached patch) by replaying the 9 commits on top if HEAD (f1a7229d-evpn.diff). It may not be correct, but it does compile and seemingly work :) root@vpp0-0:/etc/bird# birdc show proto BIRD 2.18+branch.evpn.1efd115564ea ready. Name Proto Table State Since Info device1 Device --- up 12:20:01.347 direct1 Direct --- up 12:20:01.347 kernel4 Kernel master4 up 12:20:01.347 kernel6 Kernel master6 up 12:20:01.347 static4 Static master4 up 12:20:01.347 static6 Static master6 up 12:20:01.347 bfd1 BFD --- up 12:20:01.347 ospf4 OSPF master4 up 12:20:01.347 Running ospf6 OSPF master6 up 12:20:01.347 Running bridge1 Bridge etab up 12:45:13.450 evpn1 EVPN --- up 12:45:13.450 I have a few further changes, notably a new 'vpp protocol' to handle communication with the VPP dataplane for eVPN/VxLAN (and possibly later, GENEVE and MPLS). Before I get too far down the rabbit hole though, I'm also wondering: is releasing eVPN in the cards? groet, Pim -- Pim van Pelt <pim@ipng.ch> PBVP1-RIPE https://ipng.ch/
On 14/02/2026 10:49 pm, Pim van Pelt via Bird-users wrote:
Hoi folks,
I've started to toy with VPP and eVPN/VxLAN, and took a look at the evpn branch from a few years ago. For my network, I'll need the OSPFv3 'unnumbered' features we built, so I thought I'd ask - would it be possible to rebase the evpn branch ? I've taken a stab at it (see attached patch) by replaying the 9 commits on top if HEAD (f1a7229d-evpn.diff).
I had been tinkering with the same, Ondrej said to use the oz-evpn branch vs the evpn branch. I was able to get that going with a couple of extra bridge(8) commands (see the "EPVN MPLS label parsing error" thread).
I have a few further changes, notably a new 'vpp protocol' to handle communication with the VPP dataplane for eVPN/VxLAN (and possibly later, GENEVE and MPLS). Before I get too far down the rabbit hole though, I'm also wondering: is releasing eVPN in the cards?
The oz-evpn thread talks directly with the linux bridge subsystem to handle the VxLAN interface/VNI matching, so don't know if that will suit what you want to do, but I'm guessing a tweak or 5 would work.
groet, Pim
Regards, William -- This email has been checked for viruses by Avast antivirus software. www.avast.com
Hoi folks, Bird folk, can I ask you to take a look at the rebase patch I sent? I'd love for the 'evpn' branch to be rebased. On 14.02.2026 12:49, Pim van Pelt via Bird-users wrote:
I've started to toy with VPP and eVPN/VxLAN, and took a look at the evpn branch from a few years ago. For my network, I'll need the OSPFv3 'unnumbered' features we built, so I thought I'd ask - would it be possible to rebase the evpn branch ? I've taken a stab at it (see attached patch) by replaying the 9 commits on top if HEAD (f1a7229d-evpn.diff).
It may not be correct, but it does compile and seemingly work 🙂 I have played around with this 2.18+evpn rebase and created a working eVPN/VxLAN with VPP. I stumbled across a few specifics which I'd like to share:
(1) The evpn export are causing the following assertion failure: Assertion '!((a->flags ^ desc->flags) & (BAF_OPTIONAL | BAF_TRANSITIVE))' failed at proto/bgp/attrs.c:1269 evpn_announce_mac() and evpn_announce_imet() were using ea_set_attr_ptr() with flags=0 to set BGP attributes BA_EXT_COMMUNITY and BA_PMSI_TUNNEL. Those attributes have descriptor flags BAF_OPTIONAL | BAF_TRANSITIVE, and when BGP's bgp_export_attr() processes those attributes during update encoding, it trips the assertion. This patch switches to bgp_set_attr_ptr() which automatically normalizes flags from the descriptor table, ensuring the stored attribute flags always match what the descriptor expects. Compare to l3vpn.c which correctly passed BAF_OPTIONAL | BAF_TRANSITIVE explicitly, this feels cleaner. *See bird2.18+evpn_use_bgp_set_attr.diff for a possible fix. * (2) BGP Next Hop for Type-2 should be the 'router address' from evpn protocol. When announcing an IPv4 vxlan evpn on an IPv6 BGP session, default behavior is to set the next hop using the BGP session. This means the MAC nexthops will be IPv6, not 'router address'. More-over, changing this with 'next hop address X' is not possible, because overriding the next-hop will remove the MPLS label (which carries the VNI). Under the assumption that whatever 'router address' is in the evpn protocol context will determine: 1) the PMSI [already correctly added even if the nexthop is a different family, here it does not matter] 2) the BGP next hops for Type-2 (MAC) announcements [where it matters if the evpn vxlan address family differs to the BGP session address family] This patch fixes the latter: setting the BGP next hop to the 'router address' field for evpn_announce_mac() and for consistency also for evpn_announce_imet() *See bird2.18+evpn_use_routeraddr_as_bgp_nexthop.diff for a reasonable default. * (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this. When the BGP Next Hop is changed by an export filter, we lose the MPLS labelstack. There is no way to add MPLS labelstack in filters (at least, that I could find), so we cannot use 'next hop address X' to determine the Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop, but rather a PSMI attribute with the 'router address' already. This patch ensures that the MPLS labelstack (the first label carries the VNI in the case of eVPN/VxLAN) is put back for L2VPN MPLS BGP updates. *See bird2.18+evpn_ensure_mpls_labelstack_is_set.diff for a durable fix. * Otherwise, I found that the evpn branch, rebased on 2.18, works a treat, noting that I am not using 'bridge' protocol, but instead reading eVPN information directly from the 'eth table' for my application. groet, Pim -- Pim van Pelt<pim@ipng.ch> PBVP1-RIPEhttps://ipng.ch/
Hello Pim, please check the oz-evpn branch first. I suppose (but may be wrong) that Ondřej will eventually rebase that one to master and we'll merge it. If I remember correctly, the "evpn" branch is about to be dropped later on, but for definite word, you'll have to wait for Ondřej who actually drives the EVPN development. Thank you for your understanding and patience. Maria On February 18, 2026 3:54:04 PM GMT+01:00, Pim van Pelt via Bird-users <bird-users@network.cz> wrote:
Hoi folks,
Bird folk, can I ask you to take a look at the rebase patch I sent? I'd love for the 'evpn' branch to be rebased.
On 14.02.2026 12:49, Pim van Pelt via Bird-users wrote:
I've started to toy with VPP and eVPN/VxLAN, and took a look at the evpn branch from a few years ago. For my network, I'll need the OSPFv3 'unnumbered' features we built, so I thought I'd ask - would it be possible to rebase the evpn branch ? I've taken a stab at it (see attached patch) by replaying the 9 commits on top if HEAD (f1a7229d-evpn.diff).
It may not be correct, but it does compile and seemingly work 🙂 I have played around with this 2.18+evpn rebase and created a working eVPN/VxLAN with VPP. I stumbled across a few specifics which I'd like to share:
(1) The evpn export are causing the following assertion failure: Assertion '!((a->flags ^ desc->flags) & (BAF_OPTIONAL | BAF_TRANSITIVE))' failed at proto/bgp/attrs.c:1269
evpn_announce_mac() and evpn_announce_imet() were using ea_set_attr_ptr() with flags=0 to set BGP attributes BA_EXT_COMMUNITY and BA_PMSI_TUNNEL. Those attributes have descriptor flags BAF_OPTIONAL | BAF_TRANSITIVE, and when BGP's bgp_export_attr() processes those attributes during update encoding, it trips the assertion.
This patch switches to bgp_set_attr_ptr() which automatically normalizes flags from the descriptor table, ensuring the stored attribute flags always match what the descriptor expects. Compare to l3vpn.c which correctly passed BAF_OPTIONAL | BAF_TRANSITIVE explicitly, this feels cleaner. *See bird2.18+evpn_use_bgp_set_attr.diff for a possible fix. * (2) BGP Next Hop for Type-2 should be the 'router address' from evpn protocol. When announcing an IPv4 vxlan evpn on an IPv6 BGP session, default behavior is to set the next hop using the BGP session. This means the MAC nexthops will be IPv6, not 'router address'. More-over, changing this with 'next hop address X' is not possible, because overriding the next-hop will remove the MPLS label (which carries the VNI).
Under the assumption that whatever 'router address' is in the evpn protocol context will determine: 1) the PMSI [already correctly added even if the nexthop is a different family, here it does not matter] 2) the BGP next hops for Type-2 (MAC) announcements [where it matters if the evpn vxlan address family differs to the BGP session address family]
This patch fixes the latter: setting the BGP next hop to the 'router address' field for evpn_announce_mac() and for consistency also for evpn_announce_imet() *See bird2.18+evpn_use_routeraddr_as_bgp_nexthop.diff for a reasonable default. * (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this. When the BGP Next Hop is changed by an export filter, we lose the MPLS labelstack. There is no way to add MPLS labelstack in filters (at least, that I could find), so we cannot use 'next hop address X' to determine the Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop, but rather a PSMI attribute with the 'router address' already.
This patch ensures that the MPLS labelstack (the first label carries the VNI in the case of eVPN/VxLAN) is put back for L2VPN MPLS BGP updates. *See bird2.18+evpn_ensure_mpls_labelstack_is_set.diff for a durable fix. * Otherwise, I found that the evpn branch, rebased on 2.18, works a treat, noting that I am not using 'bridge' protocol, but instead reading eVPN information directly from the 'eth table' for my application.
groet, Pim
-- Pim van Pelt<pim@ipng.ch> PBVP1-RIPEhttps://ipng.ch/
-- Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
On Wed, Feb 18, 2026 at 03:54:04PM +0100, Pim van Pelt via Bird-users wrote:
Hoi folks,
Bird folk, can I ask you to take a look at the rebase patch I sent? I'd love for the 'evpn' branch to be rebased.
Hi As others noted, the relevant branch is 'oz-evpn', the older 'evpn' branch fell victim to my needlesly strict adherence to "do not rebase public branch" rule. The patches in 'oz-evpn' are not only rebased on newer BIRD version, but also have fixes squashed in them, and there is newer development. I just pushed there rebase to 2.18. Please look at this branch first. Also note there are some minor changes to EVPN protocol configuration syntax.
On 14.02.2026 12:49, Pim van Pelt via Bird-users wrote:
I've started to toy with VPP and eVPN/VxLAN, and took a look at the evpn branch from a few years ago. For my network, I'll need the OSPFv3 'unnumbered' features we built, so I thought I'd ask - would it be possible to rebase the evpn branch ? I've taken a stab at it (see attached patch) by replaying the 9 commits on top if HEAD (f1a7229d-evpn.diff).
It may not be correct, but it does compile and seemingly work 🙂 I have played around with this 2.18+evpn rebase and created a working eVPN/VxLAN with VPP. I stumbled across a few specifics which I'd like to share:
(1) The evpn export are causing the following assertion failure: Assertion '!((a->flags ^ desc->flags) & (BAF_OPTIONAL | BAF_TRANSITIVE))' failed at proto/bgp/attrs.c:1269
evpn_announce_mac() and evpn_announce_imet() were using ea_set_attr_ptr() with flags=0 to set BGP attributes BA_EXT_COMMUNITY and BA_PMSI_TUNNEL. Those attributes have descriptor flags BAF_OPTIONAL | BAF_TRANSITIVE, and when BGP's bgp_export_attr() processes those attributes during update encoding, it trips the assertion.
This patch switches to bgp_set_attr_ptr() which automatically normalizes flags from the descriptor table, ensuring the stored attribute flags always match what the descriptor expects. Compare to l3vpn.c which correctly passed BAF_OPTIONAL | BAF_TRANSITIVE explicitly, this feels cleaner.
Already fixed in oz-evpn. I would prefer not to use bgp_set_attr() outside BGP and we already have another approach to attribute handling in BIRD 3, so i kept the ea_set_attr_ptr() functions here.
*See bird2.18+evpn_use_bgp_set_attr.diff for a possible fix. * (2) BGP Next Hop for Type-2 should be the 'router address' from evpn protocol. When announcing an IPv4 vxlan evpn on an IPv6 BGP session, default behavior is to set the next hop using the BGP session. This means the MAC nexthops will be IPv6, not 'router address'. More-over, changing this with 'next hop address X' is not possible, because overriding the next-hop will remove the MPLS label (which carries the VNI).
Under the assumption that whatever 'router address' is in the evpn protocol context will determine: 1) the PMSI [already correctly added even if the nexthop is a different family, here it does not matter] 2) the BGP next hops for Type-2 (MAC) announcements [where it matters if the evpn vxlan address family differs to the BGP session address family]
This patch fixes the latter: setting the BGP next hop to the 'router address' field for evpn_announce_mac() and for consistency also for evpn_announce_imet() *See bird2.18+evpn_use_routeraddr_as_bgp_nexthop.diff for a reasonable default.
Will look at this more.
(3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this. When the BGP Next Hop is changed by an export filter, we lose the MPLS labelstack. There is no way to add MPLS labelstack in filters (at least, that I could find), so we cannot use 'next hop address X' to determine the Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop, but rather a PSMI attribute with the 'router address' already.
Resetting MPLS label when changing next hop is intentional, as MPLS labels are (in general) specific to receiving routers. There is gw_mpls (and undocumented/semantically broken gw_mpls_stack) attribute that could be accessed in filters. I am not sure what is your use case here to change it with filters, can you describe it more? What about setting 'router address' in EVPN proto?
Otherwise, I found that the evpn branch, rebased on 2.18, works a treat, noting that I am not using 'bridge' protocol, but instead reading eVPN information directly from the 'eth table' for my application.
Good to hear that. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
Hoi,
Thanks for taking a look, Marina and Ondrej, I appreciate it!
On 18.02.2026 17:50, Ondrej Zajicek wrote:
> As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
> branch fell victim to my needlesly strict adherence to "do not rebase
> public branch" rule. The patches in 'oz-evpn' are not only rebased on
> newer BIRD version, but also have fixes squashed in them, and there is
> newer development. I just pushed there rebase to 2.18. Please look at
> this branch first. Also note there are some minor changes to EVPN protocol
> configuration syntax.
I have ported by vppevpn protocol implementation to be based on oz-evpn,
and the system is functional here also. Yaay!
I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in
'startup' until the vxlan0 interface becomes ready. However, in my
usecase, vxlan is not performed by the kernel, but by VPP, so there is
no 'vxlan0' interface. I need only 'vni' and 'router address' (and the
remote VTEP) to construct the dataplane configuration. To allow the evpn
protocol to transition to PS_UP, I decided to fire an event that
announces the IMET if router_addr and VNI are set, and skips waiting for
the interface.
See inline -
>> On 14.02.2026 12:49, Pim van Pelt via Bird-users wrote:
>>> I've started to toy with VPP and eVPN/VxLAN, and took a look at the evpn
>>> branch from a few years ago.
>>> For my network, I'll need the OSPFv3 'unnumbered' features we built, so
>>> I thought I'd ask - would it be possible to rebase the evpn branch ?
>>> I've taken a stab at it (see attached patch) by replaying the 9 commits
>>> on top if HEAD (f1a7229d-evpn.diff).
>>>
>>> It may not be correct, but it does compile and seemingly work 🙂
>> I have played around with this 2.18+evpn rebase and created a working
>> eVPN/VxLAN with VPP. I stumbled across a few specifics which I'd like to
>> share:
>>
>> (1) The evpn export are causing the following assertion failure:
>> Assertion '!((a->flags ^ desc->flags) & (BAF_OPTIONAL | BAF_TRANSITIVE))'
>> failed at proto/bgp/attrs.c:1269
>>
>> evpn_announce_mac() and evpn_announce_imet() were using ea_set_attr_ptr()
>> with flags=0 to set BGP attributes BA_EXT_COMMUNITY and BA_PMSI_TUNNEL.
>> Those attributes have descriptor flags BAF_OPTIONAL | BAF_TRANSITIVE, and
>> when BGP's bgp_export_attr() processes those attributes during update
>> encoding, it trips the assertion.
>>
>> This patch switches to bgp_set_attr_ptr() which automatically normalizes
>> flags from the descriptor table, ensuring the stored attribute flags always
>> match what the descriptor expects. Compare to l3vpn.c which correctly passed
>> BAF_OPTIONAL | BAF_TRANSITIVE explicitly, this feels cleaner.
> Already fixed in oz-evpn. I would prefer not to use bgp_set_attr() outside BGP
> and we already have another approach to attribute handling in BIRD 3, so i kept
> the ea_set_attr_ptr() functions here.
>
>
>> *See bird2.18+evpn_use_bgp_set_attr.diff for a possible fix.
>> *
>> (2) BGP Next Hop for Type-2 should be the 'router address' from evpn
>> protocol.
>> When announcing an IPv4 vxlan evpn on an IPv6 BGP session, default behavior
>> is to set the next hop using the BGP session. This means the MAC nexthops
>> will be IPv6, not 'router address'. More-over, changing this with 'next hop
>> address X' is not possible, because overriding the next-hop will remove the
>> MPLS label (which carries the VNI).
>>
>> Under the assumption that whatever 'router address' is in the evpn protocol
>> context will determine:
>> 1) the PMSI [already correctly added even if the nexthop is a different
>> family, here it does not matter]
>> 2) the BGP next hops for Type-2 (MAC) announcements [where it matters if the
>> evpn vxlan address family differs to the BGP session address family]
>>
>> This patch fixes the latter: setting the BGP next hop to the 'router
>> address' field for evpn_announce_mac() and for consistency also for
>> evpn_announce_imet()
>> *See bird2.18+evpn_use_routeraddr_as_bgp_nexthop.diff for a reasonable
>> default.
> Will look at this more.
>
>
>> (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
>> When the BGP Next Hop is changed by an export filter, we lose the MPLS
>> labelstack. There is no way to add MPLS labelstack in filters (at least,
>> that I could find), so we cannot use 'next hop address X' to determine the
>> Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
>> but rather a PSMI attribute with the 'router address' already.
> Resetting MPLS label when changing next hop is intentional, as MPLS labels are
> (in general) specific to receiving routers.
>
> There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
> attribute that could be accessed in filters.
>
> I am not sure what is your use case here to change it with filters, can
> you describe it more? What about setting 'router address' in EVPN proto?
With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
1) copy that to the PSMI attribute: good
2) not do anything for MAC announcements; they will have BGP.next_hop
set to the session address.
if the previous patch in (2) is accepted, then 'router address' will be
used as BGP.next_hop, which will avoid the need to change it with
filters with (3).
If neither patch is applied, the following config:
protocol evpn {
...
encapsulation vxlan { router address 192.0.2.1; };
}
protocol bgp {
evpn { import all; export all; };
local 2001:db8::1 as 65512;
neighbor 2001:db8::2 as 65512;
}
will yield IMET pointing at 192.0.2.1 but MAC pointing at 2001:db8::1.
If I want MAC pointing at 192.0.2.1 also, I would either need (2, my
preference) or a filter with (3).
If there exists a device out there which has different addressing for
IMET and MAC (note: I don't know of any, but perhaps they exist), then
(3) would come in handy.
For completeness, here's a small diff of the changes I made (a) allow
vxlan interface to be omitted from kernel and (b) nexthop defaults to
'router address' and (c) allow to override bgp next hop in filter.
Something like (a) is required for my usecase, and one-of ((b) or (c)).
groet,
Pim
--
Pim van Pelt<pim@ipng.ch>
PBVP1-RIPEhttps://ipng.ch/
On Wed, Feb 18, 2026 at 09:59:05PM +0100, Pim van Pelt wrote:
> Hoi,
>
> Thanks for taking a look, Marina and Ondrej, I appreciate it!
>
> On 18.02.2026 17:50, Ondrej Zajicek wrote:
> > As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
> > branch fell victim to my needlesly strict adherence to "do not rebase
> > public branch" rule. The patches in 'oz-evpn' are not only rebased on
> > newer BIRD version, but also have fixes squashed in them, and there is
> > newer development. I just pushed there rebase to 2.18. Please look at
> > this branch first. Also note there are some minor changes to EVPN protocol
> > configuration syntax.
> I have ported by vppevpn protocol implementation to be based on oz-evpn, and
> the system is functional here also. Yaay!
>
> I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in
> 'startup' until the vxlan0 interface becomes ready. However, in my usecase,
> vxlan is not performed by the kernel, but by VPP, so there is no 'vxlan0'
> interface. I need only 'vni' and 'router address' (and the remote VTEP) to
> construct the dataplane configuration. To allow the evpn protocol to
> transition to PS_UP, I decided to fire an event that announces the IMET if
> router_addr and VNI are set, and skips waiting for the interface.
Hmm, you have NULL interface in the encap->tunnel_dev? Or some fake interface
created by if_get_by_name()? Or some dummy/irrelevant interface (loopback)?
The interface is here not just to get/check router_addr and VNI, but
primarily to construct next hops for routes in bridge table:
evpn_receive_mac() / evpn_receive_imet():
.nh.iface = encap->tunnel_dev,
These are necessary not just for kernel dataplane (to specify tunnel
implemnting iface), but also formally just to have non-NULL nh.iface,
which we generally assumed in BIRD for RTD_UNICAST nexthops. So how
these routes looks in your setup?
Note that the nexthops of VXLAN-tunneled routes in bridge table are just
makeshift now, esp. usage of nh.gw for encap-dst-ip and nh->label[0]
encap-vni, these should get their own attributes (once we will redesign
nexthops to have proper attributes).
I am often uncertain how much BIRD representation of routes should match
Linux API representation of routes (esp. for idiosyncratic details like
here when Linux API assumes nominal tunnel interfaces in next hop
interfaces for lightweight tunnels), but i usually defer to try to keep
it consistent to limit impedance mismatch here. But it may cause
problems when other backends with different conventions are used, like
in your case.
Btw, i planned to explicitly configure bridge device for EVPN protocol
(as it is now implicitly through tunnel_dev->master). The idea is that as
VRF device (in Linux) defines L3 VRF, bridge device defines MAC-VRF. And
as L3 protocols are associated with specific L3 VRF, L2 protocols should
be associated with specific MAC-VRF. Do you have (kernel-level) bridge
device in your setup? (i do not mean using BIRD bridge protocol).
> > > (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
> > > When the BGP Next Hop is changed by an export filter, we lose the MPLS
> > > labelstack. There is no way to add MPLS labelstack in filters (at least,
> > > that I could find), so we cannot use 'next hop address X' to determine the
> > > Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
> > > but rather a PSMI attribute with the 'router address' already.
> > Resetting MPLS label when changing next hop is intentional, as MPLS labels are
> > (in general) specific to receiving routers.
> >
> > There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
> > attribute that could be accessed in filters.
> >
> > I am not sure what is your use case here to change it with filters, can
> > you describe it more? What about setting 'router address' in EVPN proto?
> With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
> 1) copy that to the PSMI attribute: good
> 2) not do anything for MAC announcements; they will have BGP.next_hop set to
> the session address.
>
> if the previous patch in (2) is accepted, then 'router address' will be used
> as BGP.next_hop, which will avoid the need to change it with filters with
> (3).
Oh, i see. You are right, this should work automatically for both IMET / PMSI
and MAC.
I do not like using regular/immediate next hops here in EVPN table, as
it does not fit well semantically and requires formal device. But seems
to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
Any comments?
Note that immediate next hops in EVPN table for routes received through
BGP are here just as an artefact of BGP_NEXT_HOP resolvability check,
they should not be here too.
> If neither patch is applied, the following config:
>
> protocol evpn {
> ...
> encapsulation vxlan { router address 192.0.2.1; };
> }
> protocol bgp {
> evpn { import all; export all; };
> local 2001:db8::1 as 65512;
> neighbor 2001:db8::2 as 65512;
> }
>
> will yield IMET pointing at 192.0.2.1 but MAC pointing at 2001:db8::1. If I
> want MAC pointing at 192.0.2.1 also, I would either need (2, my preference)
> or a filter with (3).
> If there exists a device out there which has different addressing for IMET
> and MAC (note: I don't know of any, but perhaps they exist), then (3) would
> come in handy.
While i agree that it should work automatically by just setting router
address in protocol evpn, i think that this setup that should work even
without patches:
protocol evpn {
...
encapsulation vxlan { router address 192.0.2.1; };
}
protocol bgp {
evpn { import all; export all; next hop address 192.0.2.1; };
local 2001:db8::1 as 65512;
neighbor 2001:db8::2 as 65512;
}
--
Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org)
"To err is human -- to blame it on a computer is even more so."
Hoi,
Thanks for your time Ondrej, and apologies Maria for mistyping your
name, Mrs IPng Networks is called Marina so that kind of just rolls off
the keyboard sometimes :)
On 19.02.2026 18:04, Ondrej Zajicek wrote:
> On Wed, Feb 18, 2026 at 09:59:05PM +0100, Pim van Pelt wrote:
>> Hoi,
>>
>> Thanks for taking a look, Marina and Ondrej, I appreciate it!
>>
>> On 18.02.2026 17:50, Ondrej Zajicek wrote:
>>> As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
>>> branch fell victim to my needlesly strict adherence to "do not rebase
>>> public branch" rule. The patches in 'oz-evpn' are not only rebased on
>>> newer BIRD version, but also have fixes squashed in them, and there is
>>> newer development. I just pushed there rebase to 2.18. Please look at
>>> this branch first. Also note there are some minor changes to EVPN protocol
>>> configuration syntax.
>> I have ported by vppevpn protocol implementation to be based on oz-evpn, and
>> the system is functional here also. Yaay!
>>
>> I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in
>> 'startup' until the vxlan0 interface becomes ready. However, in my usecase,
>> vxlan is not performed by the kernel, but by VPP, so there is no 'vxlan0'
>> interface. I need only 'vni' and 'router address' (and the remote VTEP) to
>> construct the dataplane configuration. To allow the evpn protocol to
>> transition to PS_UP, I decided to fire an event that announces the IMET if
>> router_addr and VNI are set, and skips waiting for the interface.
> Hmm, you have NULL interface in the encap->tunnel_dev? Or some fake interface
> created by if_get_by_name()? Or some dummy/irrelevant interface (loopback)?
I do specify an 'encapsulation vxlan { tunnel device "vxlan0";};'. It
satisfies Bird2 by having an interface, it just doesn't exist in the
kernel. In branch 'evpn' this was fine, in branch 'oz-evpn' this needs
me to cheat a bit because we're waiting on the device to be oper-up and
enslaved to the bridge. If I skip that part, everything works fine
without any kernel interaction. See below in [1] for my cheat.
> The interface is here not just to get/check router_addr and VNI, but
> primarily to construct next hops for routes in bridge table:
>
> evpn_receive_mac() / evpn_receive_imet():
>
> .nh.iface = encap->tunnel_dev,
>
> These are necessary not just for kernel dataplane (to specify tunnel
> implemnting iface), but also formally just to have non-NULL nh.iface,
> which we generally assumed in BIRD for RTD_UNICAST nexthops. So how
> these routes looks in your setup?
Once I convince bird to not wait for the encap->tunnel_dev oper-up and
its bridge master, the 'evpn' protocol starts, and next hop looks quite
normal.
From 'evpntab':
evpn imet 8298:200 0 192.168.10.2 [vpp0_2 2026-02-19 from
2001:678:d78:200::2] * (100) [i]
Type: BGP univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 192.168.10.2
BGP.local_pref: 100
BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8)
BGP.pmsi_tunnel: ingress-replication 192.168.10.2 mpls 20040
evpn mac 8298:200 0 fe:54:00:f0:11:23 * unicast [vpp0_2 2026-02-19 from
2001:678:d78:200::2] * (100/5) [i]
via 192.168.10.10 on e0 mpls 20040
Type: BGP univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 192.168.10.2
BGP.local_pref: 100
BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8)
BGP.mpls_label_stack: 20040
Equivalent routes from 'etab':
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 2026-02-19] * (80)
via 192.168.10.2 on vxlan0 mpls 20040
Type: EVPN univ
mpls_label: 20040
fe:54:00:f0:11:23 vlan 200 mpls 20040 unicast [evpn2 2026-02-19] * (80)
via 192.168.10.2 on vxlan0 mpls 20040
Type: EVPN univ
mpls_label: 20040
> Note that the nexthops of VXLAN-tunneled routes in bridge table are just
> makeshift now, esp. usage of nh.gw for encap-dst-ip and nh->label[0]
> encap-vni, these should get their own attributes (once we will redesign
> nexthops to have proper attributes).
The information I needed for my usecase, is nexthop '192.168.10.2', and
mpls_label '20040' from etab, and IMET from evpntab (because in P2MP
there will be multiple IMETs and etab will only carry one of them). I've
implemented also 'vid', as you see above 200, but it carries no meaning
for VPP because the bridge-domain can be separately configured to allow
untagged, single-tagged or double-tagged in the PE interfaces. If new
attributes (like the vxlan nexthop or vxlan vni you suggest below) were
to appear, it will be easy for me to switch to using them instead.
> I am often uncertain how much BIRD representation of routes should match
> Linux API representation of routes (esp. for idiosyncratic details like
> here when Linux API assumes nominal tunnel interfaces in next hop
> interfaces for lightweight tunnels), but i usually defer to try to keep
> it consistent to limit impedance mismatch here. But it may cause
> problems when other backends with different conventions are used, like
> in your case.
I think assuming by default a linux 'bridge' with its tunneling
functionality is perfectly fine, although I'd prefer it if it does not
become the /only/ valid way:
1) I'm not sure if that works well on other platforms (eg FreeBSD,
Windows, MacOS)
2) or embedded platforms (eg Broadcom or Marvell chips).
3) or VPP :-)
Requiring a linux bridge, and requiring a kernel interface, prohibits
non-linux eVPN scenarios. May I suggest that these things are kept
optional even if they are the default, but that they can be turned off,
for example by configuring a dummy interface dummy0, setting a config
toggle 'nowait' to skip waiting for it to be oper-up/enslaved, and that
we also do not require 'bridge' protocol ?
> Btw, i planned to explicitly configure bridge device for EVPN protocol
> (as it is now implicitly through tunnel_dev->master). The idea is that as
> VRF device (in Linux) defines L3 VRF, bridge device defines MAC-VRF. And
> as L3 protocols are associated with specific L3 VRF, L2 protocols should
> be associated with specific MAC-VRF.
It would be good if 'evpn' protocol can continue to be used standalone,
in particular not conflate with 'bridge'. In my view, one should be able
to inspect evpntab and etab to construct other integrations without the
need to consult kernel devices. At the moment, 'evpn' entirely so and
less so 'oz-evpn' are elegant precisely because it does complete
signalling and captures evpntab and etab using exclusively one 'evpn'
and 'bgp' protocol together with the 'evpn table' and 'eth table'. It
allows me to create a custom 'vppevpn' protocol that subscribes to those
tables. See attached config file (bird-example.conf) for an idea of
where I'm headed.
> Do you have (kernel-level) bridge
> device in your setup? (i do not mean using BIRD bridge protocol).
VPP does not use any kernel bridge or vxlan device, it entirely operates
as a userspace dataplane. In my case, Bird directly programs the VPP
dataplane, the main flow of a four-router eVPN mesh looks like this,
imagine each of these log lines is the result of an API call to VPP
directly over unix domain socket:
Feb 19 22:25:50 vpp0-3 bird[1214613]: Enabling protocol bd200
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created bridge-domain
bd=200 with tag='bird_bd200'
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel
sw_if_index=12 src=[192.168.10.3] dst=[192.168.10.0] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=12 to
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=52:54:00:f0:10:10 vid=200 to bd=200 via vtep=[192.168.10.0]
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel
sw_if_index=16 src=[192.168.10.3] dst=[192.168.10.1] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=16 to
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=52:54:00:f0:10:11 vid=200 to bd=200 via vtep=[192.168.10.1]
vni=20040 sw_if_index=16
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel
sw_if_index=11 src=[192.168.10.3] dst=[192.168.10.2] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=11 to
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=52:54:00:f0:10:12 vid=200 to bd=200 via vtep=[192.168.10.2]
vni=20040 sw_if_index=11
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=fe:54:00:f0:11:23 vid=200 to bd=200 via vtep=[192.168.10.2]
vni=20040 sw_if_index=11
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=fe:54:00:f0:11:03 vid=200 to bd=200 via vtep=[192.168.10.0]
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=00:00:02:00:00:01 vid=200 to bd=200 via vtep=[192.168.10.0]
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=fe:54:00:f0:11:13 vid=200 to bd=200 via vtep=[192.168.10.1]
vni=20040 sw_if_index=16
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200
vtep=[192.168.10.2] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200
vtep=[192.168.10.1] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200
vtep=[192.168.10.0] vni=20040
Feb 19 22:25:51 vpp0-3 bird[1214613]: bd200: learned
mac=52:54:00:f0:10:13 vid=200 on bd=200
Feb 19 22:25:51 vpp0-3 bird[1214613]: bd200: learned
mac=fe:54:00:f0:11:33 vid=200 on bd=200
I am happy to share the 'vppevpn' protocol with others also, as an
example '3P integration'. I do not expect it to be upstreamed into
Bird2, unless there are community requests for it.
Ondrej, do let me know if you'd like to take a sneak peak at my code
(it's in a private repo for now, as it's not ready for wider review yet,
but it is mostly functional).
>>>> (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
>>>> When the BGP Next Hop is changed by an export filter, we lose the MPLS
>>>> labelstack. There is no way to add MPLS labelstack in filters (at least,
>>>> that I could find), so we cannot use 'next hop address X' to determine the
>>>> Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
>>>> but rather a PSMI attribute with the 'router address' already.
>>> Resetting MPLS label when changing next hop is intentional, as MPLS labels are
>>> (in general) specific to receiving routers.
>>>
>>> There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
>>> attribute that could be accessed in filters.
>>>
>>> I am not sure what is your use case here to change it with filters, can
>>> you describe it more? What about setting 'router address' in EVPN proto?
>> With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
>> 1) copy that to the PSMI attribute: good
>> 2) not do anything for MAC announcements; they will have BGP.next_hop set to
>> the session address.
>>
>> if the previous patch in (2) is accepted, then 'router address' will be used
>> as BGP.next_hop, which will avoid the need to change it with filters with
>> (3).
> Oh, i see. You are right, this should work automatically for both IMET / PMSI
> and MAC.
>
> I do not like using regular/immediate next hops here in EVPN table, as
> it does not fit well semantically and requires formal device. But seems
> to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
> by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
> Any comments?
If you were to attach a specific attribute like vxlan_nexthop or
vxlan_vni to the etab table entry, I would simply read that and use it
instead of the bgp nexthop. That's what happens already today for IMET,
as it has the BGP.pmsi_tunnel attribute with the needed
ingress-replication 2001:678:d78:200::2 mpls 10040 information. How do
other vendors (say Arista, Cisco, Nokia, FRRouting) handle the Type-2
nexthop? My understanding is they use BGP next hop for that (in other
words, the same as how Bird does it today).
> Note that immediate next hops in EVPN table for routes received through
> BGP are here just as an artefact of BGP_NEXT_HOP resolvability check,
> they should not be here too.
Not sure I understand what you mean - don't we have this problem also
for kernel based vxlan? If we create a vxlan0 interface in a bridge, and
set a fdb entry onto it, we also need to know which vxlan nexthop to
use. The way I read 'evpn' and 'oz-evpn', we use the BGP nexthop for
that purpose. However, if what you're saying is you'd want to remove the
BGP Next Hop and instead have an EVPN VxLAN Next Hop attribute to
populate the 'etab' gateway field that would work just as well for me. I
kind of wonder why you'd go to the trouble obfuscating the BGP Next Hop.
Don't other vendors use the same thing (send vxlan packet to the address
learned via the BGP Next Hop in Type-2 announcements) ?
>> If neither patch is applied, the following config:
>>
>> protocol evpn {
>> ...
>> encapsulation vxlan { router address 192.0.2.1; };
>> }
>> protocol bgp {
>> evpn { import all; export all; };
>> local 2001:db8::1 as 65512;
>> neighbor 2001:db8::2 as 65512;
>> }
>>
>> will yield IMET pointing at 192.0.2.1 but MAC pointing at 2001:db8::1. If I
>> want MAC pointing at 192.0.2.1 also, I would either need (2, my preference)
>> or a filter with (3).
>> If there exists a device out there which has different addressing for IMET
>> and MAC (note: I don't know of any, but perhaps they exist), then (3) would
>> come in handy.
> While i agree that it should work automatically by just setting router
> address in protocol evpn, i think that this setup that should work even
> without patches:
>
> protocol evpn {
> ...
> encapsulation vxlan { router address 192.0.2.1; };
> }
> protocol bgp {
> evpn { import all; export all; next hop address 192.0.2.1; };
> local 2001:db8::1 as 65512;
> neighbor 2001:db8::2 as 65512;
> }
I don't think this works for MAC, for IMET it works because that has a
custom PSMI BGP attribute which is set to encap0->router_addr). Setting
the next hop in this way will clear the mpls labelstack. So we'd end up
with:
fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
via 192.0.2.1 on vxlan0 mpls 0
and we'd lose the VNI.
groet,
Pim
[1] skipping the wait for tunnel_dev to become operational:
@@ -1059,11 +1070,37 @@ evpn_start(struct proto *P)
P->mpls_map->vrf_iface = P->vrf;
*/
+ /* If router address and VNI are fully configured, no need to wait for
+ * the tunnel device to come up (e.g., when VPP manages VXLAN tunnels).
+ * Schedule an immediate event to transition to PS_UP. */
+ struct evpn_encap *encap0 = evpn_get_encap(p);
+ if (!ipa_zero(encap0->router_addr) && (p->vni != U32_UNDEF))
+ {
+ event *e = ev_new_init(p->p.pool, evpn_no_iface_startup, p);
+ ev_schedule(e);
+ }
+
/* Wait for VXLAN interfaces to be up */
return PS_START;
}
+static void
+evpn_no_iface_startup(void *data)
+{
+ struct evpn_proto *p = data;
+
+ if (p->p.proto_state != PS_START)
+ return;
+
+ proto_notify_state(&p->p, PS_UP);
+
+ evpn_announce_imet(p, EVPN_ROOT_VLAN(p), 1);
+
+ WALK_LIST_(struct evpn_vlan, v, p->vlans)
+ evpn_announce_imet(p, v, 1);
+}
--
Pim van Pelt<pim@ipng.ch>
PBVP1-RIPEhttps://ipng.ch/
On Thu, Feb 19, 2026 at 11:11:05PM +0100, Pim van Pelt wrote:
> Hoi,
>
> Thanks for your time Ondrej, and apologies Maria for mistyping your name,
> Mrs IPng Networks is called Marina so that kind of just rolls off the
> keyboard sometimes :)
>
> On 19.02.2026 18:04, Ondrej Zajicek wrote:
> > On Wed, Feb 18, 2026 at 09:59:05PM +0100, Pim van Pelt wrote:
> > > Hoi,
> > >
> > > Thanks for taking a look, Marina and Ondrej, I appreciate it!
> > >
> > > On 18.02.2026 17:50, Ondrej Zajicek wrote:
> > > > As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
> > > > branch fell victim to my needlesly strict adherence to "do not rebase
> > > > public branch" rule. The patches in 'oz-evpn' are not only rebased on
> > > > newer BIRD version, but also have fixes squashed in them, and there is
> > > > newer development. I just pushed there rebase to 2.18. Please look at
> > > > this branch first. Also note there are some minor changes to EVPN protocol
> > > > configuration syntax.
> > > I have ported by vppevpn protocol implementation to be based on oz-evpn, and
> > > the system is functional here also. Yaay!
> > >
> > > I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in
> > > 'startup' until the vxlan0 interface becomes ready. However, in my usecase,
> > > vxlan is not performed by the kernel, but by VPP, so there is no 'vxlan0'
> > > interface. I need only 'vni' and 'router address' (and the remote VTEP) to
> > > construct the dataplane configuration. To allow the evpn protocol to
> > > transition to PS_UP, I decided to fire an event that announces the IMET if
> > > router_addr and VNI are set, and skips waiting for the interface.
> > Hmm, you have NULL interface in the encap->tunnel_dev? Or some fake interface
> > created by if_get_by_name()? Or some dummy/irrelevant interface (loopback)?
>
> I do specify an 'encapsulation vxlan { tunnel device "vxlan0";};'. It
> satisfies Bird2 by having an interface, it just doesn't exist in the kernel.
> In branch 'evpn' this was fine, in branch 'oz-evpn' this needs me to cheat a
> bit because we're waiting on the device to be oper-up and enslaved to the
> bridge. If I skip that part, everything works fine without any kernel
> interaction. See below in [1] for my cheat.
That is the fake interface from if_get_by_name(). Using them in route
nexthops is 'fine' on the level that it does not crash due to NULL
dereference, but they were never supposed be used this way, they are
just placeholders for configuration.
Note that these fake interfaces are horrible hack in BIRD code, as
properly there should be two distinct structures: iface_config and
iface, the former representing interface referenced in config file, and
the latter representing real kernel interfaces found by 'device' protocol.
But we use the same structure for both cases.
I wonder if your setup would work, if you instead of using this fake interface
use some real placeholder interface, say loopback:
'encapsulation vxlan { tunnel device "lo"; };'
The 'cheat' have to be modified (it should wait for the interface,
but will ignore the fact that the interface is not a tunnel (i.e.
skip/ignore evpn_validate_iface_attrs()).
I am thinking about how to integrate your use case into oz-evpn, and
this seems to me as a much saner way.
> > Note that the nexthops of VXLAN-tunneled routes in bridge table are just
> > makeshift now, esp. usage of nh.gw for encap-dst-ip and nh->label[0]
> > encap-vni, these should get their own attributes (once we will redesign
> > nexthops to have proper attributes).
>
> The information I needed for my usecase, is nexthop '192.168.10.2', and
> mpls_label '20040' from etab, and IMET from evpntab (because in P2MP there
> will be multiple IMETs and etab will only carry one of them).
Note that you should read IMET from etab too. EVPN protocol translate
all IMETs from evpntab to etab, otherwise even our kernel-based setup
would not work -- 'bridge' protocol that configures kernel bridge also
reads just etab.
> I've implemented also 'vid', as you see above 200, but it carries no meaning
> for VPP because the bridge-domain can be separately configured to allow
> untagged, single-tagged or double-tagged in the PE interfaces. If new
> attributes (like the vxlan nexthop or vxlan vni you suggest below) were to
> appear, it will be easy for me to switch to using them instead.
okay.
> > I am often uncertain how much BIRD representation of routes should match
> > Linux API representation of routes (esp. for idiosyncratic details like
> > here when Linux API assumes nominal tunnel interfaces in next hop
> > interfaces for lightweight tunnels), but i usually defer to try to keep
> > it consistent to limit impedance mismatch here. But it may cause
> > problems when other backends with different conventions are used, like
> > in your case.
> I think assuming by default a linux 'bridge' with its tunneling
> functionality is perfectly fine, although I'd prefer it if it does not
> become the /only/ valid way:
> 1) I'm not sure if that works well on other platforms (eg FreeBSD, Windows,
> MacOS)
> 2) or embedded platforms (eg Broadcom or Marvell chips).
> 3) or VPP :-)
>
> Requiring a linux bridge, and requiring a kernel interface, prohibits
> non-linux eVPN scenarios. May I suggest that these things are kept optional
> even if they are the default, but that they can be turned off, for example
> by configuring a dummy interface dummy0, setting a config toggle 'nowait' to
> skip waiting for it to be oper-up/enslaved, and that we also do not require
> 'bridge' protocol ?
Yeah, that is probably how it will be. Assume Linux networking model,
but do not depend on it.
> > Btw, i planned to explicitly configure bridge device for EVPN protocol
> > (as it is now implicitly through tunnel_dev->master). The idea is that as
> > VRF device (in Linux) defines L3 VRF, bridge device defines MAC-VRF. And
> > as L3 protocols are associated with specific L3 VRF, L2 protocols should
> > be associated with specific MAC-VRF.
> It would be good if 'evpn' protocol can continue to be used standalone, in
> particular not conflate with 'bridge'. In my view, one should be able to
> inspect evpntab and etab to construct other integrations without the need to
> consult kernel devices. At the moment, 'evpn' entirely so and less so
> 'oz-evpn' are elegant precisely because it does complete signalling and
> captures evpntab and etab using exclusively one 'evpn' and 'bgp' protocol
> together with the 'evpn table' and 'eth table'. It allows me to create a
> custom 'vppevpn' protocol that subscribes to those tables. See attached
> config file (bird-example.conf) for an idea of where I'm headed.
I agree and this split of work between 'evpn' protocol and 'bridge' protocol
(with separate 'evpn table' and 'eth table') are going to stay.
> I am happy to share the 'vppevpn' protocol with others also, as an example
> '3P integration'. I do not expect it to be upstreamed into Bird2, unless
> there are community requests for it.
> Ondrej, do let me know if you'd like to take a sneak peak at my code (it's
> in a private repo for now, as it's not ready for wider review yet, but it is
> mostly functional).
Having better integration with VPP (or some other userspace dataplane)
is something we are interested in general, but i would not look at it
before i finish some other tasks (including merging EVPN) as i am rather
overwhelmed.
> > > > > (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
> > > > > When the BGP Next Hop is changed by an export filter, we lose the MPLS
> > > > > labelstack. There is no way to add MPLS labelstack in filters (at least,
> > > > > that I could find), so we cannot use 'next hop address X' to determine the
> > > > > Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
> > > > > but rather a PSMI attribute with the 'router address' already.
> > > > Resetting MPLS label when changing next hop is intentional, as MPLS labels are
> > > > (in general) specific to receiving routers.
> > > >
> > > > There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
> > > > attribute that could be accessed in filters.
> > > >
> > > > I am not sure what is your use case here to change it with filters, can
> > > > you describe it more? What about setting 'router address' in EVPN proto?
> > > With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
> > > 1) copy that to the PSMI attribute: good
> > > 2) not do anything for MAC announcements; they will have BGP.next_hop set to
> > > the session address.
> > >
> > > if the previous patch in (2) is accepted, then 'router address' will be used
> > > as BGP.next_hop, which will avoid the need to change it with filters with
> > > (3).
> > Oh, i see. You are right, this should work automatically for both IMET / PMSI
> > and MAC.
> >
> > I do not like using regular/immediate next hops here in EVPN table, as
> > it does not fit well semantically and requires formal device. But seems
> > to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
> > by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
> > Any comments?
>
> If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni
> to the etab table entry, I would simply read that and use it instead of the
> bgp nexthop. That's what happens already today for IMET, as it has the
> BGP.pmsi_tunnel attribute with the needed ingress-replication
> 2001:678:d78:200::2 mpls 10040 information. How do other vendors (say
> Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding
> is they use BGP next hop for that (in other words, the same as how Bird does
> it today).
I think there is some confusion here. I am talking about evpntab
entries, not about etab entries. And about your patch that sets router
IP into their immediate next hop (nh.gw).
> > Note that immediate next hops in EVPN table for routes received through
> > BGP are here just as an artefact of BGP_NEXT_HOP resolvability check,
> > they should not be here too.
>
> Not sure I understand what you mean - don't we have this problem also for
> kernel based vxlan? If we create a vxlan0 interface in a bridge, and set a
> fdb entry onto it, we also need to know which vxlan nexthop to use. The way
> I read 'evpn' and 'oz-evpn', we use the BGP nexthop for that purpose.
> However, if what you're saying is you'd want to remove the BGP Next Hop and
> instead have an EVPN VxLAN Next Hop attribute to populate the 'etab' gateway
> field that would work just as well for me. I kind of wonder why you'd go to
> the trouble obfuscating the BGP Next Hop. Don't other vendors use the same
> thing (send vxlan packet to the address learned via the BGP Next Hop in
> Type-2 announcements) ?
I just mean that immediate next hop fields for evpntab routes received
through BGP are irrelevant, while the BGP next hop attribute is the
important one. When 'evpn' protocol takes a route from evpntab and convert
it to etab entry, it examines BGP next hop, not immediate next hop.
bird> show route table evpntab all
Table evpntab:
evpn mac 1:22 2 76:a0:cf:05:dd:4f * unicast [ibgp1 00:06:40.215 from 10.1.2.1] * (100/20) [i]
via 10.1.21.2 on ve1 mpls 200022
Type: BGP univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 10.1.2.1
BGP.local_pref: 100
BGP.ext_community: (rt, 1, 0) (generic, 0x30c0000, 0x8)
BGP.mpls_label_stack: 200022
Here immediate next hop is 10.1.21.2, while BGP next hop is 10.1.2.1 (two hops away)
If EVPN had been encapsulated in MPLS, then it would have made sense to
show the immediate next hop, but in case of VXLAN encapsulation, the
traffic is encapsulated and sent to BGP next hop.
> > While i agree that it should work automatically by just setting router
> > address in protocol evpn, i think that this setup that should work even
> > without patches:
> >
> > protocol evpn {
> > ...
> > encapsulation vxlan { router address 192.0.2.1; };
> > }
> > protocol bgp {
> > evpn { import all; export all; next hop address 192.0.2.1; };
> > local 2001:db8::1 as 65512;
> > neighbor 2001:db8::2 as 65512;
> > }
> I don't think this works for MAC, for IMET it works because that has a
> custom PSMI BGP attribute which is set to encap0->router_addr). Setting the
> next hop in this way will clear the mpls labelstack. So we'd end up with:
> fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
> via 192.0.2.1 on vxlan0 mpls 0
> and we'd lose the VNI.
I think it will not clear the MPLS labelstack. This is not setting next
hop in filters. The difference between
evpn { import all; export all; next hop address 192.0.2.1; };
and
evpn { import all; export all; };
in BGP protocol export is only where the BGP next hop value is taken
from (explicitly configured one or source address from BGP session), but
route processing is the same. See bgp_update_next_hop_ip(), the
!bgp_use_next_hop(s, a) and !bgp_use_gateway(s) case.
--
Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org)
"To err is human -- to blame it on a computer is even more so."
Hoi,
On 20.02.2026 01:31, Ondrej Zajicek wrote:
> That is the fake interface from if_get_by_name(). Using them in route
> nexthops is 'fine' on the level that it does not crash due to NULL
> dereference, but they were never supposed be used this way, they are
> just placeholders for configuration.
>
> Note that these fake interfaces are horrible hack in BIRD code, as
> properly there should be two distinct structures: iface_config and
> iface, the former representing interface referenced in config file, and
> the latter representing real kernel interfaces found by 'device' protocol.
> But we use the same structure for both cases.
Understood - once iface_config and iface are split, I can make use of
either construct (the iface_config one makes more sense). Neither the
interface name or kernel device are necessary in my implementation.
> I wonder if your setup would work, if you instead of using this fake interface
> use some real placeholder interface, say loopback:
>
> 'encapsulation vxlan { tunnel device "lo"; };'
It works fine. As an aside, reconfiguring causes a restart of evpn
protocol, which trips an assertion and crashes. The crash also happens
on 'birdc disable evpn1'.
Feb 20 12:12:29 vpp0-3 bird[1455113]: Restarting protocol evpn1
Feb 20 12:12:29 vpp0-3 bird[1455113]: Assertion 'pub->queue &&
pub->topic' failed at lib/pubsub.c:161
Feb 20 12:12:29 vpp0-3 systemd[1]: bird-dataplane.service: Main process
exited, code=killed, status=11/SEGV
Either way, Bird comes back up and works just fine using tunnel_dev set
to "lo". It reminds me that I already use this trick, as MAC addresses
learned from VPP's bridge-domain do not have any corresponding Linux or
Bird interface, so I inject them into etab using "lo" as well.
> The 'cheat' have to be modified (it should wait for the interface,
> but will ignore the fact that the interface is not a tunnel (i.e.
> skip/ignore evpn_validate_iface_attrs()).
I like that. Perhaps a keyword in the config can signal that this is OK,
like 'tunnel device "evpn0-dummy" virtual;' or just 'tunnel device "lo"
virtual;'
> Note that you should read IMET from etab too. EVPN protocol translate
> all IMETs from evpntab to etab, otherwise even our kernel-based setup
> would not work -- 'bridge' protocol that configures kernel bridge also
> reads just etab.
I do not have multiple IMETs in etab, only one:
root@vpp0-0:/etc/bird# birdc show route table evpntab | grep imet
evpn imet 8298:100 0 2001:678:d78:200::3 [vpp0_3 12:12:38.484 from
2001:678:d78:200::3] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200::2 [vpp0_2 11:18:21.821 from
2001:678:d78:200::2] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200::1 [vpp0_1 11:18:21.253 from
2001:678:d78:200::1] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200:: unicast [evpn1 11:18:07.285] * (120)
root@vpp0-0:/etc/bird# birdc show route table etab | grep 00:00:00:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 12:12:38.484] * (80)
Perhaps I'm holding it wrong (see bird-example.conf). It would actually
be super if I could rely /only/ on etab, as tracking both etab and
evpntab was a fair amount of extra code.
> I agree and this split of work between 'evpn' protocol and 'bridge' protocol
> (with separate 'evpn table' and 'eth table') are going to stay.
Thank you! That's great news for me.
>> I am happy to share the 'vppevpn' protocol with others also, as an example
>> '3P integration'. I do not expect it to be upstreamed into Bird2, unless
>> there are community requests for it.
>> Ondrej, do let me know if you'd like to take a sneak peak at my code (it's
>> in a private repo for now, as it's not ready for wider review yet, but it is
>> mostly functional).
> Having better integration with VPP (or some other userspace dataplane)
> is something we are interested in general, but i would not look at it
> before i finish some other tasks (including merging EVPN) as i am rather
> overwhelmed.
I can volunteer my time to write a vpp protocol (for ip4, ip6, mpls FIB
and interfaces). I'll contact you separately for that, it sounds like a
worthwhile project and I've kind of always wanted to do it.
>>>>>> (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
>>>>>> When the BGP Next Hop is changed by an export filter, we lose the MPLS
>>>>>> labelstack. There is no way to add MPLS labelstack in filters (at least,
>>>>>> that I could find), so we cannot use 'next hop address X' to determine the
>>>>>> Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
>>>>>> but rather a PSMI attribute with the 'router address' already.
>>>>> Resetting MPLS label when changing next hop is intentional, as MPLS labels are
>>>>> (in general) specific to receiving routers.
>>>>>
>>>>> There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
>>>>> attribute that could be accessed in filters.
>>>>>
>>>>> I am not sure what is your use case here to change it with filters, can
>>>>> you describe it more? What about setting 'router address' in EVPN proto?
>>>> With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
>>>> 1) copy that to the PSMI attribute: good
>>>> 2) not do anything for MAC announcements; they will have BGP.next_hop set to
>>>> the session address.
>>>>
>>>> if the previous patch in (2) is accepted, then 'router address' will be used
>>>> as BGP.next_hop, which will avoid the need to change it with filters with
>>>> (3).
>>> Oh, i see. You are right, this should work automatically for both IMET / PMSI
>>> and MAC.
>>>
>>> I do not like using regular/immediate next hops here in EVPN table, as
>>> it does not fit well semantically and requires formal device. But seems
>>> to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
>>> by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
>>> Any comments?
>> If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni
>> to the etab table entry, I would simply read that and use it instead of the
>> bgp nexthop. That's what happens already today for IMET, as it has the
>> BGP.pmsi_tunnel attribute with the needed ingress-replication
>> 2001:678:d78:200::2 mpls 10040 information. How do other vendors (say
>> Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding
>> is they use BGP next hop for that (in other words, the same as how Bird does
>> it today).
> I think there is some confusion here. I am talking about evpntab
> entries, not about etab entries. And about your patch that sets router
> IP into their immediate next hop (nh.gw).
I see - then maybe I can try a different approach. The patch, I thought,
makes Bird behave the same as Nokia SRLinux {1], which also sets the
router ip (the local VTEP) as nexthop but what you're saying is I should
not set the /immediate/ nexthop, but rather leave that alone and set the
/BGP Next Hop/? Although as a reminder, I need to be able to set an IPv4
BGP Next Hop on an IPv6 session only for some RTs, not all. See one more
thought on that below ..
>> Not sure I understand what you mean - don't we have this problem also for
>> kernel based vxlan? If we create a vxlan0 interface in a bridge, and set a
>> fdb entry onto it, we also need to know which vxlan nexthop to use. The way
>> I read 'evpn' and 'oz-evpn', we use the BGP nexthop for that purpose.
>> However, if what you're saying is you'd want to remove the BGP Next Hop and
>> instead have an EVPN VxLAN Next Hop attribute to populate the 'etab' gateway
>> field that would work just as well for me. I kind of wonder why you'd go to
>> the trouble obfuscating the BGP Next Hop. Don't other vendors use the same
>> thing (send vxlan packet to the address learned via the BGP Next Hop in
>> Type-2 announcements) ?
> I just mean that immediate next hop fields for evpntab routes received
> through BGP are irrelevant, while the BGP next hop attribute is the
> important one. When 'evpn' protocol takes a route from evpntab and convert
> it to etab entry, it examines BGP next hop, not immediate next hop.
OK I think I understand now.
>>> While i agree that it should work automatically by just setting router
>>> address in protocol evpn, i think that this setup that should work even
>>> without patches:
>>>
>>> protocol evpn {
>>> ...
>>> encapsulation vxlan { router address 192.0.2.1; };
>>> }
>>> protocol bgp {
>>> evpn { import all; export all; next hop address 192.0.2.1; };
>>> local 2001:db8::1 as 65512;
>>> neighbor 2001:db8::2 as 65512;
>>> }
>> I don't think this works for MAC, for IMET it works because that has a
>> custom PSMI BGP attribute which is set to encap0->router_addr). Setting the
>> next hop in this way will clear the mpls labelstack. So we'd end up with:
>> fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
>> via 192.0.2.1 on vxlan0 mpls 0
>> and we'd lose the VNI.
> I think it will not clear the MPLS labelstack. This is not setting next
> hop in filters. The difference between
>
> evpn { import all; export all; next hop address 192.0.2.1; };
>
> and
>
> evpn { import all; export all; };
>
> in BGP protocol export is only where the BGP next hop value is taken
> from (explicitly configured one or source address from BGP session), but
> route processing is the same. See bgp_update_next_hop_ip(), the
> !bgp_use_next_hop(s, a) and !bgp_use_gateway(s) case.
I tried this, and you are correct that 'next hop address' works and
leaves the MPLS labelstack alone:
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
via 192.168.10.0 on lo mpls 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
via 192.168.10.0 on lo mpls 20040
Now let's suppose I have two evpn protocols, one with an IPv4 router
address and one with an IPv6 router address. In this scenario, I can't
use 'next hop address' because it'll force both to use that address family.
It yields a bad state:
1) as before, the IPv4-only evpn (VNI 20040) works
2) but now, the evpn with an IPv6 router address, sends IMET with IPv6,
and MAC with IPv4
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
via 2001:678:d78:200:: on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
via 192.168.10.0 on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
An obvious solution is to use a filter, like this one:
filter bgp_evpn_out {
if (rt, 8298, 10040) ~ bgp_ext_community then { bgp_next_hop =
192.168.10.3; }
if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop =
2001:678:d78:200::3; }
accept;
}
template bgp T_BGP_EVPN {
evpn { import all; export filter bgp_evpn_out; };
local 2001:678:d78:200::3 as 65512;
}
But now the filter does destroy the MPLS labelstack, although the
mpls_label attribute remains:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
via 2001:678:d78:200:: on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
* via 2001:678:d78:200:: on lo mpls 0*
Type: EVPN univ
mpls_label: 10040
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
via 192.168.10.0 on lo mpls 20040
Type: EVPN univ
mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
* via 192.168.10.0 on lo mpls 0*
Type: EVPN univ
mpls_label: 20040
My conclusion was: I need to be able to apply filters without destroying
the MPLS labels. If I now understand correctly, I can remove the
nh.gw/nh.iface from evpn_announce_mac() and evpn_announce_imet(), but
keep the change in bgp_update_next_hop_ip()
@@ -1314,19 +1310,6 @@ bgp_update_next_hop_ip(struct bgp_export_state
*s, eattr *a, ea_list **to)
}
}
+ /* For L2VPN (EVPN): ensure MPLS label stack is set even if next hop
was filter-overridden */
+ if (s->mpls && bgp_channel_is_l2vpn(s->channel) && !bgp_find_attr(*to,
BA_MPLS_LABEL_STACK))
+ {
+ rta *ra = s->route->attrs;
+ if (ra->nh.labels)
+ bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0,
ra->nh.label, ra->nh.labels * 4);
+ else
+ {
+ u32 label = ea_get_int(ra->eattrs, EA_MPLS_LABEL, BGP_MPLS_NULL);
+ bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0, &label, 4);
+ }
+ }
This allows the above filter to work while preserving the labelstack:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
via 2001:678:d78:200:: on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
* via 2001:678:d78:200:: on lo mpls 10040*
Type: EVPN univ
mpls_label: 10040
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
via 192.168.10.0 on lo mpls 20040
Type: EVPN univ
mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
* via 192.168.10.0 on lo mpls 20040*
Type: EVPN univ
mpls_label: 20040
Of course, open to better solutions :)
groet,
Pim
[1] A:pim@asw121# show network-instance default protocols bgp routes
evpn route-type 2 detail | more
Route Distinguisher: 65500:264
Tag-ID : 0
MAC address : 64:9D:99:D0:70:4D
IP Address : 10.26.0.1
neighbor : 198.19.16.0
path-id : 0
Received paths : 1
Path 1: <Best,Valid,Used,>
ESI : 00:00:00:00:00:00:00:00:00:00
Label : 264
Route source : neighbor 198.19.16.0 (last modified 68d14h37m6s
ago)
Route preference : No MED, LocalPref is 100
Atomic Aggr : false
BGP next-hop : 198.19.18.0
AS Path : i
Communities : [target:65500:264, bgp-tunnel-encap:VXLAN]
RR Attributes : No Originator-ID, Cluster-List is []
Aggregation : None
Unknown Attr : None
Invalid Reason : None
Tie Break Reason : none
Route Flap Damping: None
--
Pim van Pelt<pim@ipng.ch>
PBVP1-RIPEhttps://ipng.ch/
Hi Sorry for late answer, some comments below. On Fri, Feb 20, 2026 at 01:24:07PM +0100, Pim van Pelt via Bird-users wrote:
Note that you should read IMET from etab too. EVPN protocol translate all IMETs from evpntab to etab, otherwise even our kernel-based setup would not work -- 'bridge' protocol that configures kernel bridge also reads just etab. I do not have multiple IMETs in etab, only one: root@vpp0-0:/etc/bird# birdc show route table evpntab | grep imet evpn imet 8298:100 0 2001:678:d78:200::3 [vpp0_3 12:12:38.484 from 2001:678:d78:200::3] * (100) [i] evpn imet 8298:100 0 2001:678:d78:200::2 [vpp0_2 11:18:21.821 from 2001:678:d78:200::2] * (100) [i] evpn imet 8298:100 0 2001:678:d78:200::1 [vpp0_1 11:18:21.253 from 2001:678:d78:200::1] * (100) [i] evpn imet 8298:100 0 2001:678:d78:200:: unicast [evpn1 11:18:07.285] * (120)
root@vpp0-0:/etc/bird# birdc show route table etab | grep 00:00:00: 00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 12:12:38.484] * (80)
Perhaps I'm holding it wrong (see bird-example.conf). It would actually be super if I could rely /only/ on etab, as tracking both etab and evpntab was a fair amount of extra code.
Yes, the grep does not work, as the second route does not repeat NLRI in show route output: bird> show route 10.1.2.0/24 table master4 Table master4: 10.1.2.0/24 unicast [ospf4 01:50:45.246] * I (150/20) [10.0.1.2] via 10.1.21.2 on ve1 unicast [ibgp1 01:50:46.919 from 10.1.2.1] (100/20) [i] via 10.1.21.2 on ve1 bird> show route eth 00:00:00:00:00:00 table etab2 Table etab2: 00:00:00:00:00:00 mpls 12 unicast [evpn1 01:50:46.919] * (80) via 10.1.2.1 on vxlan2 mpls 200022 unicast [evpn1 01:50:47.001] (80) via 10.1.3.1 on vxlan2 mpls 30
I do not like using regular/immediate next hops here in EVPN table, as it does not fit well semantically and requires formal device. But seems to me that a reasonable alternative would be to just attach BGP_NEXT_HOP by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that. Any comments? If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni to the etab table entry, I would simply read that and use it instead of the bgp nexthop. That's what happens already today for IMET, as it has the BGP.pmsi_tunnel attribute with the needed ingress-replication 2001:678:d78:200::2 mpls 10040 information. How do other vendors (say Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding is they use BGP next hop for that (in other words, the same as how Bird does it today). I think there is some confusion here. I am talking about evpntab entries, not about etab entries. And about your patch that sets router IP into their immediate next hop (nh.gw). I see - then maybe I can try a different approach. The patch, I thought, makes Bird behave the same as Nokia SRLinux {1], which also sets the router ip (the local VTEP) as nexthop but what you're saying is I should not set the /immediate/ nexthop, but rather leave that alone and set the /BGP Next Hop/? Although as a reminder, I need to be able to set an IPv4 BGP Next Hop on an IPv6 session only for some RTs, not all. See one more thought on that below ..
Look at the patch: https://gitlab.nic.cz/labs/bird/-/commit/b0ff170fbc70bfc978efe92257ca8b49dbd... EVPN originates routes already with BGP next hop based on configured router ip (in evpn_announce_mac() / evpn_announce_imet()) while bgp_use_next_hop() has a case to always keep such next hops. So if you have one EVPN proto with IPv4 router address and another with IPv6 router address, so BGP Next Hop would be set to the appropriate address in each case. (not yet a patch for dummy tunnel ifaces) -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
Hoi, On 05.03.2026 17:11, Ondrej Zajicek wrote:
show route eth 00:00:00:00:00:00 table etab2 I have built a fresh bird2.18+oz-evpn including your latest b0ff170f. I still see only one entry in etab, while I am expecting three. I do see three entries in evpntab.
I've attached 'show route all' and 'bird-example.conf' with the current vpp0-0 config, so you can take a look, but for me only one imet route made it to etab. root@vpp0-0:/etc/bird# birdc show route eth 00:00:00:00:00:00 vlan 100 all table etab BIRD 2.18+branch.oz.evpn.b0ff170fbc70 ready. Table etab: 00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 18:23:48.761] * (80) via 2001:678:d78:200::2 on vxlan0 mpls 10040 Type: EVPN univ mpls_label: 10040 root@vpp0-0:/etc/bird# birdc show route eth 00:00:00:00:00:00 vlan 200 all table etab BIRD 2.18+branch.oz.evpn.b0ff170fbc70 ready. Table etab: 00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 18:23:48.761] * (80) via 192.168.10.2 on vxlan0 mpls 20040 Type: EVPN univ mpls_label: 20040 The only clear difference I see with your output is that I'm using vid 100 and vid200 in the evpn protocol, and both evpn1 and evpn2 use the same etab.
Look at the patch:
https://gitlab.nic.cz/labs/bird/-/commit/b0ff170fbc70bfc978efe92257ca8b49dbd... Patch does not work for me, I am expecting the VNI 20040 to be IPv4, correctly copied into the PMSI attribute, but its BGP.next_hop are IPv6 with this patch, I am expecting IPv4 BGP next hops:
evpn imet 8298:200 0 192.168.10.3 [vpp0_3 18:23:36.312 from 2001:678:d78:200::3] * (100) [i] Type: BGP univ BGP.origin: IGP BGP.as_path: BGP.next_hop: 2001:678:d78:200::3 BGP.local_pref: 100 BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8) BGP.pmsi_tunnel: ingress-replication 192.168.10.3 mpls 20040 evpn imet 8298:200 0 192.168.10.2 [vpp0_2 18:23:48.761 from 2001:678:d78:200::2] * (100) [i] Type: BGP univ BGP.origin: IGP BGP.as_path: BGP.next_hop: 2001:678:d78:200::2 BGP.local_pref: 100 BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8) BGP.pmsi_tunnel: ingress-replication 192.168.10.2 mpls 20040 evpn imet 8298:200 0 192.168.10.1 [vpp0_1 18:23:39.764 from 2001:678:d78:200::1] * (100) [i] Type: BGP univ BGP.origin: IGP BGP.as_path: BGP.next_hop: 2001:678:d78:200::1 BGP.local_pref: 100 BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8) BGP.pmsi_tunnel: ingress-replication 192.168.10.1 mpls 20040 (full route-table in route.all.txt) I built at commit b0ff170fbc70bfc978efe92257ca8b49dbdbaf92 (HEAD -> oz-evpn, origin/oz-evpn) groet, Pim -- Pim van Pelt<pim@ipng.ch> PBVP1-RIPEhttps://ipng.ch/
Hoi, I spoke too soon, the patch _does_ work; When writing the mail below, I did not restart bird2 on all machines, so some where using the old version and only one was using the new version. With Ondrej's fix in b0ff170f, I now see the correct IMET endpoints using BGP.next_hop. However, the issue that only one such IMET makes it from evpntab to etab remains, I would expect three for each VNI. Current route.all.txt attached to see the full picture. groet, Pim On 10.03.2026 13:58, Pim van Pelt via Bird-users wrote:
Hoi,
On 05.03.2026 17:11, Ondrej Zajicek wrote:
show route eth 00:00:00:00:00:00 table etab2 I have built a fresh bird2.18+oz-evpn including your latest b0ff170f. I still see only one entry in etab, while I am expecting three. I do see three entries in evpntab.
I've attached 'show route all' and 'bird-example.conf' with the current vpp0-0 config, so you can take a look, but for me only one imet route made it to etab. root@vpp0-0:/etc/bird# birdc show route eth 00:00:00:00:00:00 vlan 100 all table etab BIRD 2.18+branch.oz.evpn.b0ff170fbc70 ready. Table etab: 00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 18:23:48.761] * (80) via 2001:678:d78:200::2 on vxlan0 mpls 10040 Type: EVPN univ mpls_label: 10040 root@vpp0-0:/etc/bird# birdc show route eth 00:00:00:00:00:00 vlan 200 all table etab BIRD 2.18+branch.oz.evpn.b0ff170fbc70 ready. Table etab: 00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 18:23:48.761] * (80) via 192.168.10.2 on vxlan0 mpls 20040 Type: EVPN univ mpls_label: 20040
The only clear difference I see with your output is that I'm using vid 100 and vid200 in the evpn protocol, and both evpn1 and evpn2 use the same etab.
Look at the patch:
https://gitlab.nic.cz/labs/bird/-/commit/b0ff170fbc70bfc978efe92257ca8b49dbd...
Patch does not work for me, I am expecting the VNI 20040 to be IPv4, correctly copied into the PMSI attribute, but its BGP.next_hop are IPv6 with this patch, I am expecting IPv4 BGP next hops:
evpn imet 8298:200 0 192.168.10.3 [vpp0_3 18:23:36.312 from 2001:678:d78:200::3] * (100) [i] Type: BGP univ BGP.origin: IGP BGP.as_path: BGP.next_hop: 2001:678:d78:200::3 BGP.local_pref: 100 BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8) BGP.pmsi_tunnel: ingress-replication 192.168.10.3 mpls 20040 evpn imet 8298:200 0 192.168.10.2 [vpp0_2 18:23:48.761 from 2001:678:d78:200::2] * (100) [i] Type: BGP univ BGP.origin: IGP BGP.as_path: BGP.next_hop: 2001:678:d78:200::2 BGP.local_pref: 100 BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8) BGP.pmsi_tunnel: ingress-replication 192.168.10.2 mpls 20040 evpn imet 8298:200 0 192.168.10.1 [vpp0_1 18:23:39.764 from 2001:678:d78:200::1] * (100) [i] Type: BGP univ BGP.origin: IGP BGP.as_path: BGP.next_hop: 2001:678:d78:200::1 BGP.local_pref: 100 BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8) BGP.pmsi_tunnel: ingress-replication 192.168.10.1 mpls 20040
(full route-table in route.all.txt)
I built at commit b0ff170fbc70bfc978efe92257ca8b49dbdbaf92 (HEAD -> oz-evpn, origin/oz-evpn)
groet, Pim
-- Pim van Pelt <pim@ipng.ch> PBVP1-RIPE https://ipng.ch/
Hoi, On 10.03.2026 15:23, Pim van Pelt via Bird-users wrote:
However, the issue that only one such IMET makes it from evpntab to etab remains, I would expect three for each VNI. In evpn_receive_imet(), struct rte_src *s = rt_get_source(&p->p, *rd_to_u64(n0->rd)*); which makes all IMET routes share the same rte_src (for these routes in VNI 20040, the RD is a constant 8298:200), so all three remote IMETs for ::1 ::2 and ::3 will get the same value of s. Is that intentional?
The way I understand it, each call to rte_update2() overwrites the previous (net, rte_src), so the last IMET received wins, the earlier two ones are discarded, and we end up with always one etab entry. I don't see how One possible fix is to use the originator IP (n0->rtr) or perhaps the BGP router-id (ref->src->proto->remote_id) as the source key instead of the RD, so each remote IMET speaker gets its own rte_src? Perhaps something like evpn_imet_multipath.patch [as an example, not something I expect you to merge], which when applied does give me the desired outcome: root@vpp0-0:~# birdc show route eth 00:00:00:00:00:00 vlan 100 table etab BIRD 2.18+branch.oz.evpn.2adc66776d5d ready. Table etab: 00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 21:52:04.008] * (80) via 2001:678:d78:200::1 on vxlan0 mpls 10040 unicast [evpn1 21:52:05.449] (80) via 2001:678:d78:200::2 on vxlan0 mpls 10040 unicast [evpn1 21:52:11.849] (80) via 2001:678:d78:200::3 on vxlan0 mpls 10040 root@vpp0-0:~# birdc show route eth 00:00:00:00:00:00 vlan 200 table etab BIRD 2.18+branch.oz.evpn.2adc66776d5d ready. Table etab: 00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 21:52:04.008] * (80) via 192.168.10.1 on vxlan0 mpls 20040 unicast [evpn2 21:52:05.449] (80) via 192.168.10.2 on vxlan0 mpls 20040 unicast [evpn2 21:52:11.849] (80) via 192.168.10.3 on vxlan0 mpls 20040 groet, Pim -- Pim van Pelt<pim@ipng.ch> PBVP1-RIPEhttps://ipng.ch/
On Tue, Mar 10, 2026 at 01:58:24PM -0400, Pim van Pelt wrote:
Hoi,
On 05.03.2026 17:11, Ondrej Zajicek wrote:
show route eth 00:00:00:00:00:00 table etab2 I have built a fresh bird2.18+oz-evpn including your latest b0ff170f. I still see only one entry in etab, while I am expecting three. I do see three entries in evpntab.
I've attached 'show route all' and 'bird-example.conf' with the current vpp0-0 config, so you can take a look, but for me only one imet route made it to etab. root@vpp0-0:/etc/bird# birdc show route eth 00:00:00:00:00:00 vlan 100 all table etab BIRD 2.18+branch.oz.evpn.b0ff170fbc70 ready. Table etab: 00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 18:23:48.761] * (80) via 2001:678:d78:200::2 on vxlan0 mpls 10040 Type: EVPN univ mpls_label: 10040 root@vpp0-0:/etc/bird# birdc show route eth 00:00:00:00:00:00 vlan 200 all table etab BIRD 2.18+branch.oz.evpn.b0ff170fbc70 ready. Table etab: 00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 18:23:48.761] * (80) via 192.168.10.2 on vxlan0 mpls 20040 Type: EVPN univ mpls_label: 20040
The only clear difference I see with your output is that I'm using vid 100 and vid200 in the evpn protocol, and both evpn1 and evpn2 use the same etab.
Helo I think it is because you have the same route distinguisher 8298:200 on all these routers. If i understand it correctly, each router should use different RD (while they use the same route target (RT) if they are in the same VPN). We use RD to derive internal distinguisher for ethernet routes in etab, so if your EVPN routers use the same RD, even that IMET routes are distinguished in evpntab by other info (router IP), they are treated as an update to one network after translation to etab. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
Hoi, On 11.03.2026 08:41, Ondrej Zajicek wrote:
I think it is because you have the same route distinguisher 8298:200 on all these routers. If i understand it correctly, each router should use different RD (while they use the same route target (RT) if they are in the same VPN). I interpreted RFC 7432, Section 7.9 differently "An RD MUST be assigned for a given MAC-VRF on a PE. *This RD MUST be unique across all MAC-VRFs on a PE*. It is RECOMMENDED to use the Type 1 RD [RFC4364]. The value field comprises an IP address of the PE (typically, the loopback address) followed by a number unique to the PE."
(emphasis mine) While the RFC mandates uniqueness only within a single PE (across its MAC-VRFs), it als recommends Type 1 RDs using the PE's loopback IP, which happens to produce globally unique RDs across the network. I was further thrown off because on a set of Nokia SR-Linux routers that run an eVPN VxLAN mesh, the RDs are indeed the same: A:pim@asw120# info flat / network-instance peeringlan protocols bgp-vpn bgp-instance 1 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604 A:pim@asw100# info flat / network-instance peeringlan protocols bgp-vpn bgp-instance 1 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604 which is why I would've assumed adding multiple Bird2's with RD 65500:2604 would be the idiomatic way to do this. However the important bits are that (a) I can now rely on etab having the multiple IMETs as you said, so I can simplify my vppevpn protocol to rely only on etab, and not evpntab; and (b) I learned a lot :) Thank you so much! groet, Pim -- Pim van Pelt <pim@ipng.ch> PBVP1-RIPE https://ipng.ch/
On Wed, Mar 11, 2026 at 10:06:06AM -0400, Pim van Pelt via Bird-users wrote:
Hoi,
On 11.03.2026 08:41, Ondrej Zajicek wrote:
I think it is because you have the same route distinguisher 8298:200 on all these routers. If i understand it correctly, each router should use different RD (while they use the same route target (RT) if they are in the same VPN). I interpreted RFC 7432, Section 7.9 differently "An RD MUST be assigned for a given MAC-VRF on a PE. *This RD MUST be unique across all MAC-VRFs on a PE*. It is RECOMMENDED to use the Type 1 RD [RFC4364]. The value field comprises an IP address of the PE (typically, the loopback address) followed by a number unique to the PE."
(emphasis mine) While the RFC mandates uniqueness only within a single PE (across its MAC-VRFs), it als recommends Type 1 RDs using the PE's loopback IP, which happens to produce globally unique RDs across the network.
My reading of this section is that RD must be unique per EVI (EVPN Instance). If you have two PEs that are part of the same EVI, that means PE1 has a MAC-VRF that contains routes from PE2 (and vice versa), therefore if PE1 uses the same RD as PE2, then such RD would not be unique in that MAC-VRF on PE1. In your case all three routes are in the same EVPN instance as they are exported to one MAC-VRF (i.e. etab).
However the important bits are that (a) I can now rely on etab having the multiple IMETs as you said, so I can simplify my vppevpn protocol to rely only on etab, and not evpntab; and (b) I learned a lot :) Thank you so much!
You are welcome! -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) "To err is human -- to blame it on a computer is even more so."
(starting a new thread) Hoi Bird eVPNers! I spent some time with RFC4364 and RFC7432 today. I want to revisit the IMET key and make a case for changing it from RD to hash{RD,etag,rtr}. Referencing RFC4363 section 4.2, there are three types of RD: Type 0: 16 bit ASN, 32 bit sequence (eg 8298:123456) -- not necessarily unique in eVPN if same seq is chosen Type 1: 32 bit IP, 16 bit sequence (eg 192.0.2.1:123) -- unique in eVPN due to use of PE router-id Type 2: 32 bit ASN, 16 bit sequence (eg 829800:123) -- not necessarily unique in eVPN if same seq is chosen Referencing RFC7432, notably section 7.3: For procedures and usage of this route, please see Sections11 <https://datatracker.ietf.org/doc/html/rfc7432#section-11> ("Handling of Multi-destination Traffic"), 12 ("Processing of Unknown Unicast Packets"), and 16 ("Multicast and Broadcast"). The IP address length is in bits. For the purpose of BGP route key processing, only the Ethernet Tag ID, IP Address Length, and Originating Router's IP Address fields are considered to be part of the prefix in the NLRI. The route key for IMET is {etag,iplen,rtr} so the current code, using RD, will not work for Type 2 or Type 0 RDs like the ones from Nokia I showed below. It will work for Type 1 RDs because there the RD encodes a unique IP. Maybe we can consider rte_src to be something like: u64 imet_key = u64_hash0(rd_to_u64(n0->rd), HASH_PARAM, u32_hash0(n0->tag, HASH_PARAM, ip6_hash0(n0->rtr, HASH_PARAM, 0))); A strictly RFC-conformant key for the stated purpose of IMET uniqueness would be {etag, rtr} (with iplen implicit in rtr), but perhaps {rd,etag,rtr} is a safe superset with defense in depth. Either choice would allow Type 0 (and Type 2) RDs to work in Bird and also in inter-op cases with vendors, and it would have no impact to Type 1 RDs. groet, Pim On 11.03.2026 10:06, Pim van Pelt wrote:
Hoi,
On 11.03.2026 08:41, Ondrej Zajicek wrote:
I think it is because you have the same route distinguisher 8298:200 on all these routers. If i understand it correctly, each router should use different RD (while they use the same route target (RT) if they are in the same VPN). I interpreted RFC 7432, Section 7.9 differently "An RD MUST be assigned for a given MAC-VRF on a PE. *This RD MUST be unique across all MAC-VRFs on a PE*. It is RECOMMENDED to use the Type 1 RD [RFC4364]. The value field comprises an IP address of the PE (typically, the loopback address) followed by a number unique to the PE."
(emphasis mine) While the RFC mandates uniqueness only within a single PE (across its MAC-VRFs), it als recommends Type 1 RDs using the PE's loopback IP, which happens to produce globally unique RDs across the network.
I was further thrown off because on a set of Nokia SR-Linux routers that run an eVPN VxLAN mesh, the RDs are indeed the same: A:pim@asw120# info flat / network-instance peeringlan protocols bgp-vpn bgp-instance 1 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604
A:pim@asw100# info flat / network-instance peeringlan protocols bgp-vpn bgp-instance 1 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604 set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604 which is why I would've assumed adding multiple Bird2's with RD 65500:2604 would be the idiomatic way to do this.
However the important bits are that (a) I can now rely on etab having the multiple IMETs as you said, so I can simplify my vppevpn protocol to rely only on etab, and not evpntab; and (b) I learned a lot :) Thank you so much!
groet, Pim -- Pim van Pelt <pim@ipng.ch> PBVP1-RIPE https://ipng.ch/
-- Pim van Pelt<pim@ipng.ch> PBVP1-RIPEhttps://ipng.ch/
participants (4)
-
Maria Matejka -
Ondrej Zajicek -
Pim van Pelt -
William