Hi, This patch fixes a situation where a route that becomes unviable because of a configuration change (i.e. next hop of its new version is unreachable) is marked as not installed internally, but actually remains in the kernel routing table until a subsequent configuration change makes the next hop reachable again. The code that updates static routes seems to process changes as deletions + insertions, but if the insertion can't happen for any reason the deletion won't happen as well, even though the route is already marked as uninstalled by then. When that happens BIRD and the kernel get out of sync until BIRD tries to install the route again, which might be never. The patch is supposed to fix the problem by detecting routes that (a) have changed but (b) are not getting deleted because their updated version can't get installed, and deletes them anyway. There's also a quick sanity check to make sure we don't end up with 3 states for the installed flag (which should obviously be a base 2 boolean), even though 3 states are needed temporarily when processing updates. Thanks, Pierluigi diff -a -u -r a/proto/static/static.c b/proto/static/static.c --- a/proto/static/static.c 2013-04-29 14:41:58.000000000 -0700 +++ b/proto/static/static.c 2014-02-24 12:45:46.433136581 -0800 @@ -62,10 +62,15 @@ rta a, *aa; rte *e; - if (r->installed) + /* 3 cases here: + * (a) r->installed > 0: already installed, no op; + * (b) !r->installed: really really not installed, create; + * (c) r->installed < 0: needs update but it's there. + */ + if (r->installed > 0) return; - DBG("Installing static route %I/%d, rtd=%d\n", r->net, r->masklen, r->dest); + DBG("Installing static route %I/%d->%I, rtd=%d\n", r->net, r->masklen, r->via, r->dest); bzero(&a, sizeof(a)); a.proto = p; a.source = (r->dest == RTD_DEVICE) ? RTS_STATIC_DEVICE : RTS_STATIC; @@ -82,7 +87,7 @@ struct mpnh **nhp = &nhs; for (r2 = r->mp_next; r2; r2 = r2->mp_next) - if (r2->installed) + if (r2->installed > 0) { struct mpnh *nh = alloca(sizeof(struct mpnh)); nh->gw = r2->via; @@ -122,10 +127,17 @@ { net *n; + /* 3 cases here: + * (a) r->installed > 0: remove; + * (b) !r->installed: it's not there, don't remove; + * (c) r->installed < 0: update, needs removal even + * if new route can't be installed. + */ + if (!r->installed) return; - DBG("Removing static route %I/%d\n", r->net, r->masklen); + DBG("Removing static route %I/%d->%I\n", r->net, r->masklen, r->via); n = net_find(p->table, r->net, r->masklen); if (n) rte_update(p->table, n, p, p, NULL); @@ -410,6 +422,7 @@ static_match(struct proto *p, struct static_route *r, struct static_config *n) { struct static_route *t; + int new_installed = 0; /* * For given old route *r we find whether a route to the same @@ -422,17 +435,28 @@ r->neigh->data = NULL; WALK_LIST(t, n->iface_routes) if (static_same_net(r, t)) - { - t->installed = r->installed && static_same_dest(r, t); - return; - } + goto mark_for_deletion; WALK_LIST(t, n->other_routes) if (static_same_net(r, t)) - { - t->installed = r->installed && static_same_dest(r, t); - return; - } + goto mark_for_deletion; static_remove(p, r); + return; + +mark_for_deletion: + new_installed = r->installed && static_same_dest(r, t); + if(new_installed == 0 && r->installed) { + /* XXX: this will silently uninstall a route that changed even though its new + * version cannot be installed immediately. + * Without this the route won't be removed (as required by the change), since + * its new version cannot currently be installed. + * So what we want to do is to remove the route here for real. BIRD + * will worry about adding it again later after the next hop becomes + * reachable again. + */ + t->installed = -1; + } else + t->installed = new_installed; + return; } static inline rtable * @@ -467,6 +491,20 @@ WALK_LIST(r, n->other_routes) static_add(p, n, r); + /* route->installed has now 3 possible values: + * (a) > 0, route is installed; + * (b) == 0, route is not installed; + * (c) < 0, route was marked as not installed because it changed and it needs update. + * + * Beyond this point it's not legal to keep the < 0 case. It's actually a bug in the + * code. Any route that is < 0 needed to be removed. The proper thing to do + * would be asserting or so. + */ + WALK_LIST(r, n->iface_routes) + ASSERT(r->installed >= 0); + WALK_LIST(r, n->other_routes) + ASSERT(r->installed >= 0); + return 1; }
On Tue, Feb 25, 2014 at 06:59:10PM +0000, Pierluigi Rolando wrote:
Hi,
This patch fixes a situation where a route that becomes unviable because of a configuration change (i.e. next hop of its new version is unreachable) is marked as not installed internally, but actually remains in the kernel routing table until a subsequent configuration change makes the next hop reachable again.
The code that updates static routes seems to process changes as deletions + insertions, but if the insertion can't happen for any reason the deletion won't happen as well, even though the route is already marked as uninstalled by then. When that happens BIRD and the kernel get out of sync until BIRD tries to install the route again, which might be never.
The patch is supposed to fix the problem by detecting routes that (a) have changed but (b) are not getting deleted because their updated version can't get installed, and deletes them anyway. There's also a quick sanity check to make sure we don't end up with 3 states for the installed flag (which should obviously be a base 2 boolean), even though 3 states are needed temporarily when processing updates.
Thanks, the analysis and the patch is more-or-less correct, with one minor note:
for (r2 = r->mp_next; r2; r2 = r2->mp_next) - if (r2->installed) + if (r2->installed > 0) { struct mpnh *nh = alloca(sizeof(struct mpnh)); nh->gw = r2->via;
No need for this change, secondary multipath nodes from mp_next cannot be -1, they are not modified in static_match() but in static_add() and the meaning is slightly different. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
Thanks, the analysis and the patch is more-or-less correct, with one minor note:
for (r2 = r->mp_next; r2; r2 = r2->mp_next) - if (r2->installed) + if (r2->installed > 0) { struct mpnh *nh = alloca(sizeof(struct mpnh)); nh->gw = r2->via;
No need for this change, secondary multipath nodes from mp_next cannot be -1, they are not modified in static_match() but in static_add() and the meaning is slightly different.
Understood. I've never tried multipath routes so that bit was untested. I've also tried the patch you've attached to fix the TMP_DOWN issue and so far it's working well. Thanks, Pierluigi
participants (2)
-
Ondrej Zajicek -
Pierluigi Rolando