[PATCH] babel: Send out low-interval hello on shutdown
Toke Høiland-Jørgensen
toke at toke.dk
Wed May 4 15:38:55 CEST 2022
Ondrej Zajicek <santiago at crfreenet.org> writes:
> On Sat, Apr 23, 2022 at 10:31:26PM +0200, Toke Høiland-Jørgensen wrote:
>> Ondrej Zajicek <santiago at crfreenet.org> writes:
>> > BTW, when we added CI tests for Babel authentication, we noticed that it
>> > has rather slow convergence after reconfiguration. The reason is that
>> > when authentication changed to become non-matching, it took many missed
>> > (misauthenticated) hellos to sufficiently clean up hello_map for neighbor
>> > to go down.
>>
>> Just to make sure I'm understanding the scenario correctly: A node is
>> reconfigured to turn on authentication, but not all peers use it; so now
>> some peers are essentially cut off (basically like if they just dropped
>> off the network).
>
> See:
>
> https://gitlab.nic.cz/labs/bird-tools/-/tree/master/netlab/cf-babel-auth
>
> We just change keys to be matching/mismatched (although enabling /
> disabling auth is a similar case)
>
>> However, their hello history remain, so their routes
>> stay active until they time out. Right?
>
> Yes
>
>
>> > Perhaps there could be some decision in iface reconfiguration that the
>> > change is significant to affect reachability of neighbors and in such
>> > case deprecate some/most items in hello_map.
>>
>> Hmm, yeah, we could do something like that I suppose. I'm wondering if
>> it should really be stronger, though? Enabling auth on an
>> already-running instance is an increase in the "security level" of the
>> interface, so should we really be keeping unauthenticated data around at
>> all? I.e., maybe we should simply flush all neighbour entries on an
>> interface when enabling auth on that interface?
>
> I would prefer to not make it that disruptive. Also it is not just
> enabling auth, but in general changing keys.
>
>
>> Or another, slightly less disruptive, option is to flush a neighbour
>> if we receive a packet from it that fails auth, and that neighbour
>> doesn't have the 'auth_passed' flag set?
>> For existing neighbours that
>> succeeds auth, that flag should be set immediately on the next packet we
>> receive from that neighbour, whereas this would quickly clear out
>> neighbours that fail it. We could maybe speed things up further by
>> immediately issuing auth challenges to all neighbours when the config
>> changes?
>
> I thing that the current approach where there is some leeway between
> reconfiguration and dropping a neighbor is basically ok and it is better
> to just ignore failed auths than trigger some action based on that. Just
> that with full hello history it took too long. So perhaps just removing
> 3/4 of hellos from hello history?
Hmm, right, that sounds reasonable. Something like the below, then?
-Toke
diff --git a/proto/babel/babel.c b/proto/babel/babel.c
index 4a7d550f5efe..55ae5f972573 100644
--- a/proto/babel/babel.c
+++ b/proto/babel/babel.c
@@ -1861,8 +1861,25 @@ babel_reconfigure_iface(struct babel_proto *p, struct babel_iface *ifa, struct b
babel_iface_update_buffers(ifa);
- if ((new->auth_type != BABEL_AUTH_NONE) && (new->auth_type != old->auth_type))
- babel_auth_reset_index(ifa);
+ if (new->auth_type != BABEL_AUTH_NONE)
+ {
+ struct babel_neighbor *n;
+
+ if (new->auth_type != old->auth_type)
+ babel_auth_reset_index(ifa);
+
+ WALK_LIST(n, ifa->neigh_list)
+ {
+ /* trim the hello history on auth change to make sure neighbours expire
+ * quickly if they no longer pass auth
+ */
+ if (n->hello_cnt < 2)
+ continue;
+
+ n->hello_cnt = MAX(2, n->hello_cnt / 4);
+ n->hello_map &= 0xFFFF >> (16 - n->hello_cnt);
+ }
+ }
if (ipa_zero(ifa->next_hop_ip4) && p->ip4_channel)
log(L_WARN "%s: Missing IPv4 next hop address for %s", p->p.name, ifa->ifname);
More information about the Bird-users
mailing list