[PATCH] More multipath support for OSPF

Ondrej Zajicek santiago at crfreenet.org
Thu Apr 24 00:52:49 CEST 2014


On Fri, Feb 28, 2014 at 10:16:48AM +0100, Peter Christensen wrote:
> Hi,
>
> I've revised the old patch and now propose this new version (attached).

Hi

Sorry for late answer, i finally find some time to process, review,
partially rewrite and merge the patch. I made these changes:

1) Unified add_nexthops() and merge_nexthops() as the algorithm is
   essentially the same.

2) Fixed several issues in compare functions (see below).

3) Fixed issue with missing special handling of device nexthops and
   vlink nexthops (and perhaps configured stub networks).

4) Fixed issue in the BIRD code which required special handling
   of device nexthops during ECMP merging so it is no longer needed
   (see fix_device_nexthops()).

5) Added config option 'merge external' to make merging of external
   routes from different LSAs/ASBRs optional (and to keep compatibility
   with the old behavior).

6) Special handling of orta.en when external routes are merged
   (see ort_merge_ext())

7) Fixed some related issues in the BIRD code (like missing incomplete
   resolution of virtual next hops for routers in ospf_rt_abr1());

8) Some style changes and some ri_* -> ort[a]_* function renaming.


My comments to your patch are below and the final merged patch is attached.
Feel free to comment it.


> First of all, I fixed a bug in add_nexthops() which could cause it to  
> forget the existing nexthops if they appeared later in sorting order  
> than then newly added nexthops.
> Secondly, if the resulting list of nexthops contains more than one  
> nexthop, it now handles the router ID and tag. The only place router ID  
> is used is apparently in NSSA handling which prefers larger router IDs.  
> So to ensure that a multipath route doesn't get confused with another  
> route with a lower router ID, we keep the highest router ID when adding  
> multipaths. As for tags, it's difficult to tell what should be done, but  
> in this case I've decided to clear the tag if the two tag differs,  
> ensuring that we don't lie about the routes (except for cases where  
> admins uses tag=0 to indicate something weird).

OK, that is a good idea to clean up tag and keeping higher router ID.

> ri_better(), ri_better_ext(), ri_better_asbr(), and my old  
> ri_equal_cost() have been replaced with ri_compare(), ri_compare_ext(),  
> and ri_compare_asbr() which returns <0, 1, or >0 depending on relative  
> priority. The comparison code is structured syntactically different, but  
> the end result is less code (and on x86 and amd64, less machine code).

> +  /* Prefer lowest type 1 metric */
> +  ret = old->metric1 - new->metric1;
> +  if (ret != 0)
> +    return ret;
>  
> -  if (new->metric1 < old->metric1)
> -    return 1;
> -
> -  if (new->metric1 > old->metric1)
> -    return 0;

OK, but typecast to int would be a good idea for unsigned variables.
Without that it will break if compiled in IPL64 mode (unlikely)
or if someone later changes type of some of these variables
to a smaller type (sometimes more likely).

> just had to handle unsigned 32 bit values differently (rid and areaid).
> Paths are _ONLY_ considered equal if the area ids are also equal. In  
> other cases, the highest area id is used. ri_compare_asbr() already  
> needed that priority, and for the other scenarios, the RFC does not  
> mandate priority. According to the IETF OSPF mailing list, different  
> implementations prioritize area ids differently.
>
> I decided to do potential multipath in /all/ ri_install functions based  
> on the RFC 2328 which states that all routes can be multipath. Besides,  
> multiple threads in the IETF OSPF mailinglist seems to confirm that any  
> route can be multipath as long as prefix, path type, cost, and area is  
> the same.

That makes sense for all the functions with the exception of
ri_compare_asbr() / ri_install_asbr(), where RFC 2328 explicitly
specifies that highest area should be used and there is just one
possible ASBR with given router ID per area, so merging cannot happen.

> I also took the liberty to add const to arguments where it made sense.

OK

>
> diff --git a/proto/ospf/rt.c b/proto/ospf/rt.c
> index 1b39bda..a39ff2c 100644
> --- a/proto/ospf/rt.c
> +++ b/proto/ospf/rt.c
> @@ -12,6 +12,8 @@ static void add_cand(list * l, struct top_hash_entry *en,
>  		     struct top_hash_entry *par, u32 dist,
>  		     struct ospf_area *oa, int i);
>  static void rt_sync(struct proto_ospf *po);
> +static int
> +cmp_nhs(struct mpnh *s1, struct mpnh *s2);
>  
>  /* In ospf_area->rtr we store paths to routers, but we use RID (and not IP address)
>     as index, so we need to encapsulate RID to IP address */
> @@ -54,7 +56,7 @@ new_nexthop(struct proto_ospf *po, ip_addr gw, struct iface *iface, unsigned cha
>  }
>  
>  static inline struct mpnh *
> -copy_nexthop(struct proto_ospf *po, struct mpnh *src)
> +copy_nexthop(struct proto_ospf *po, const struct mpnh *src)
>  {
>    struct mpnh *nh = lp_alloc(po->nhpool, sizeof(struct mpnh));
>    nh->gw = src->gw;
> @@ -64,74 +66,153 @@ copy_nexthop(struct proto_ospf *po, struct mpnh *src)
>    return nh;
>  }
>  
> +static void
> +add_nexthops(struct proto_ospf *po, struct orta *old, const struct orta *new)
> +{
> +  struct mpnh **pnh = &old->nhs;
> +  struct mpnh *newnh = new->nhs;
> +
> +  int count = po->ecmp;
> +
> +  /* Iterate through new next hops */
> +  while (newnh != NULL && count)
> +  {
> +    struct mpnh *nh = *pnh;
> +    if (nh == NULL)
> +    {
> +      /* Add next hop to end of list */
> +      *pnh = nh = copy_nexthop(po, newnh);
> +
> +      pnh = &nh->next;
> +      newnh = newnh->next;
> +    }
> +    else
> +    {
> +      int cmp = cmp_nhs(nh, newnh);
> +      if (cmp < 0)
> +      {
> +	/* Continue to next nexthop */
> +	pnh = &nh->next;
> +      }
> +      else if (cmp > 0)
> +      {
> +	/* Add nexthop before current nexthop */
> +	struct mpnh *next;
> +
> +	*pnh = next = copy_nexthop(po, newnh);
> +
> +	next->next = nh;

I think this code is wrong, because orta.nhs values are shared between
different orta nodes created from one top_hash_entry (like one
router-LSA with several stub networks), therefore old->nhs cannot be
modified in place. Proper solution would be use the similar approach
like in merge_nexthops() - check if orta.nhs is private or shared and
merge accordingly. I modified the code of merge_nexthops() so it
could be used for both top_hash_entry.nhs and orta.nhs merging, as it is
essentially the same process.

>  /* Whether the ASBR or the forward address destination is preferred
>     in AS external route selection according to 16.4.1. */
>  static inline int
> -epath_preferred(orta *ep)
> +epath_preferred(const orta *ep)
>  {
>    return (ep->type == RTS_OSPF) && (ep->oa->areaid != 0);
>  }
>  
> -/* 16.4. (3), return 1 if new is better */
>  
> -  /* 16.4. (6c) */
> +  /* 16.4. (6c) - If not RFC1583, select preferred paths */
>    if (!po->rfc1583)
>    {
> -    u32 new_pref = new->options & ORTA_PREF;
> -    u32 old_pref = old->options & ORTA_PREF;
> -
> -    if (new_pref > old_pref)
> -      return 1;
> -
> -    if (new_pref < old_pref)
> -      return 0;
> +    ret = (new->options & ORTA_PREF) - (old->options & ORTA_PREF);
> +    if (ret != 0)
> +      return ret;

I think here is a bug, because ORTA_PREF is 0x80000000, which means that
if new has the flag while old does not, ret would be negative (because
0x80000000 typecasted to int is -2^31).

>  
> -  /* 16.4. (6d) */
> -  if (new->metric1 < old->metric1)
> -    return 1;
> -
> -  if (new->metric1 > old->metric1)
> -    return 0;
> -
> -  /* RFC 3103, 2.5. (6e) */
> -  int new_prio = orta_prio(new);
> -  int old_prio = orta_prio(old);
> +  /* 16.4. (6d) - Prefer the route with lowest type 1 metric */
> +  ret = old->metric1 - new->metric1;
> +  if (ret != 0)
> +    return ret;
>  
> +  /* RFC 3101, 2.5. (6e) - Prioritize Type-7 LSA with P-bit, then Type-5 LSA, then LSA with higher router ID */
> +  ret = orta_prio(new) - orta_prio(old);
> +  if (ret != 0)
> +    return ret;
>  
> -  /* make it more deterministic */
> -  if (new->rid > old->rid)
> -    return 1;
> +  /* RFC 3101, 2.4 (2) - If the P-bit settings are the same, the LSA with higher router ID is preferred */
> +  if (new->options & ORTA_NSSA)
> +  {
> +    ret = (new->rid > old->rid) - (new->rid < old->rid);
> +    if (ret != 0)
> +      return ret;
> +  }
>  
> -  return 0;
> +  /* Prefer largest area id */
> +  ret = (new->oa->areaid > old->oa->areaid) - (new->oa->areaid < old->oa->areaid);
> +  return ret;
>  }

These two comparisons (RFC 3101 2.5 (6e) preferences and largest area
ID) do not make sense for external routes. The first one (together with
higher router ID preference) is intended primarily for functionally
equivalent LSAs (i.e. same non-zero forwarding address and other
fields). There is no reason to not merge them (but we have to keep best
LSA according to this criteria in orta.en for proper NSSA translation
mechanism).

There is no reason to compare area IDs for external routes (and such
comparison is not mentioned in RFC 2328). Altough BIRD keeps orta.oa for
external rtes, areas are explicitly undefined for external routing table
entries in the standard (see RFC 2328 11. 'Area') and therefore the
general rule that a routing table entry keeps nexthops from the same
area is irrelevant, external routes based on LSAs from ASBRs from
different areas could merge.


-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multipath.patch
Type: text/x-diff
Size: 25963 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20140424/e094a76b/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20140424/e094a76b/attachment.asc>


More information about the Bird-users mailing list