[PATCH] General protocol route limits

Ondrej Zajicek santiago at crfreenet.org
Mon Nov 21 13:11:54 CET 2011


On Mon, Nov 14, 2011 at 04:40:25PM +0400, Alexander V. Chernikov wrote:
> Hello list!
> 
> At the moment bird has route limits implemented for BGP only (route
> limit XXX).
> 
...
> 
> This patch introduces general limiting functionality for any protocol.
> 
> Import/export limits can be configured with the following actions:
> * warn (prints warning message)
> * block (blocks new import/exports from/to the protocol)
> * shutdown (restart the protocol, import only)
> * disable (shutdown and disable protocol)
> 
> If any protocol limit is hit and block action is taken, protocol can be
> returned to 'normal' state by using reload [in|out] protocol (or
> restaring it).

There are several problems, esp. with 'block' action in export limit. Primary
cause is that BIRD does not remember which route was sent to a peer,
instead it reruns filter to know that. This get broken when 'block' is
active, because it is not possible to know whether given route was
blocked or not. For example, if export is blocked, then a new route arrives,
it is blocked, but if later the route is updated, it is treated as an update
(and not as a new route) and propagated, therefore limit is exceeded
('old' value in do_rte_announce is not NULL).

Not sure if there is a way how to fix it. One possible way is when
protocol is blocked then convert any later updates to withdraws (instead
just block new routes). Don't know how exactly the block/withdraw
behaves in Junipers and Ciscos. Note also that we have to count routes
blocked by this as exported in exp_routes, otherwise it would be
inconsistent and would drift away (when genuine withdraw is announced,
it is unknown whether the withdrawed route is 'older' route (propagated
before the block) or 'later' route (propagated after the block).

Another possibility is just do not allow block in export filter.

Another related problem is that exp_routes statistics became unreliable
when 'configure soft' is used, export filter is changed and protocol
is not reloaded. This is probably a reason why BGP limits were originally
implemented for import only.

Some minor comments later:

> From f9c8c639593aa98723397a640f8d899b85c39fb7 Mon Sep 17 00:00:00 2001
> From: Alexander V. Chernikov <melifaro at ipfw.ru>
> Date: Mon, 14 Nov 2011 15:33:27 +0000
> Subject: [PATCH 1/1] * Implement  general protocol limiting
> 
> ---
>  doc/bird.sgml       |   16 +++++-
>  nest/config.Y       |   32 +++++++++
>  nest/proto.c        |  175 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  nest/protocol.h     |   42 ++++++++++++-
>  nest/rt-table.c     |   49 ++++++++++++++-
>  proto/bgp/bgp.c     |   28 +++++---
>  proto/bgp/bgp.h     |    1 -
>  proto/bgp/config.Y  |    9 +++-
>  proto/bgp/packets.c |    3 -
>  9 files changed, 329 insertions(+), 26 deletions(-)
> 
> diff --git a/doc/bird.sgml b/doc/bird.sgml
> index 7f53f02..4060f05 100644
> --- a/doc/bird.sgml
> +++ b/doc/bird.sgml
> @@ -415,6 +415,19 @@ to zero to disable it. An empty <cf><m/switch/</cf> is equivalent to <cf/on/
>  	<tag>export <m/filter/</tag> This is similar to the <cf>import</cf> keyword, except that it
>  	works in the direction from the routing table to the protocol. Default: <cf/none/.
>  
> +	<tag>import limit <m/number/ exceed warn | block | shutdown | disable</tag>

It would be much better to rename 'shutdown' to 'restart'.

> + | IMPORT LIMIT expr {
> +    if ((!strcmp(this_proto->protocol->name, "Device")) ||
> +      (!strcmp(this_proto->protocol->name, "Static")))
> +	cf_error("%s protocol does not support import limits", this_proto->protocol->name);

Why not? Altough it probably does not make much sense, i see no need to
create an exception. And even if so, strcmp() is a crazy way to check
protocol class.

> +    if (!this_proto->in_limit)
> +      this_proto->in_limit = cfg_allocz(sizeof(struct proto_limit));
> +    this_limit = this_proto->in_limit;
> +    this_limit->direction = PL_IMPORT;
> +    this_limit->limit = $3;
> +  }
> +   EXCEED limit_action
> + | EXPORT LIMIT expr {
> +    if (!this_proto->out_limit)
> +      this_proto->out_limit = cfg_allocz(sizeof(struct proto_limit));
> +    this_limit = this_proto->out_limit;
> +    this_limit->direction = PL_EXPORT;
> +    this_limit->limit = $3;
> +  }
> +   EXCEED limit_action

This seems unnecessary complicated, what about just (without global
variable this_limit and with return value of limit_action):

 | IMPORT LIMIT expr EXCEED limit_action {
    ...
    xxx->limit = $3
    xxx->action = $5
    // and check for valid action w.r.t. direction
  }

>     SYM {
>       if ($1->class != SYM_TABLE) cf_error("Table name expected");
> diff --git a/nest/proto.c b/nest/proto.c
> index d55c348..f022dc2 100644
> --- a/nest/proto.c
> +++ b/nest/proto.c
> @@ -33,6 +33,11 @@ static list initial_proto_list;
>  static list flush_proto_list;
>  static struct proto *initial_device_proto;
>  
> +/* protocol limiting variables */
> +static struct rate_limit rl_rt_limit;
> +static event *proto_shut_event;
> +static list abuse_proto_list;

> -  if (p->reconfiguring && p->core_state == FS_HUNGRY && p->proto_state == PS_DOWN)
> +  if ((p->flags & PFLAG_RECONFIGURING) && p->core_state == FS_HUNGRY && p->proto_state == PS_DOWN)

It would be nice to use some inline function to check for reconfiguring flag.

> +	  /* Remove from pre-shutdown list if exists */
> +	  WALK_LIST_DELSAFE(ap, ap_next, abuse_proto_list)
> +	  {
> +	    if ((ap = SKIP_BACK(struct proto, shutdown_node, ap)) != p)
> +	      continue;
> +	    rem_node(&ap->shutdown_node);
> +	    break;
> +	  }

This seems a bit crazy to me - what about just use some flag to mark a proto
in abuse_proto_list and then call rem_node base on that?

Personally, i would remove the whole abuse_proto_list. Disabling or restarting
protocols because of limit breaching seems to be too unfrequent event
that just walking active_proto_list and acting based on flags seems OK and
keeps the code simpler.

> +#define PL_HANDLED		1	/* Limit action is handled by the protocol hook */
> +#define PL_DEFAULT		2	/* Core has to execute default action */

Perhaps just keep the callback return boolean, like the reconfigure hook?

> +#define PL_ACTION_WARN		1	/* Issue log warning */
> +#define PL_ACTION_BLOCK		2	/* Block new routes */
> +#define PL_ACTION_CLEAR		3	/* Clear exported routes */

Some unimplemented feature?


> @@ -188,20 +188,26 @@ do_rte_announce(struct announce_hook *a, int type UNUSED, net *net, rte *new, rt
>  {
>    struct proto *p = a->proto;
>    struct filter *filter = p->out_filter;
> +  struct proto_limit *l = p->out_limit;
>    struct proto_stats *stats = &p->stats;
>    rte *new0 = new;
>    rte *old0 = old;
> -  int ok;
> +  int ok, wrong_table = 0;
>  
>  #ifdef CONFIG_PIPE
>    /* The secondary direction of the pipe */
>    if (proto_is_pipe(p) && (p->table != a->table))
>      {
>        filter = p->in_filter;
> +      l = p->in_limit;
>        stats = pipe_get_peer_stats(p);
>      }
>  #endif
>  
> +  /* Check if we're called on non-default protocol table */
> +  if ((!proto_is_pipe(p)) && (p->table != a->table))
> +    wrong_table = 1;
> +

Some unimplemented thing?

>    if (new)
>      {
>        stats->exp_updates_received++;
> @@ -272,6 +278,22 @@ do_rte_announce(struct announce_hook *a, int type UNUSED, net *net, rte *new, rt
>    if (!new && !old)
>      return;
>  
> +  /* Check if we can export new route / exceed export limit */
> +  if (new && l && !old)
> +  {
> +    if (p->flags & PFLAG_ELIMIT_BLOCK)
> +      return;
> +
> +    if ((l = p->out_limit) && (p->stats.exp_routes + 1 > l->limit) && (proto_notify_limit(p, l, a->table) == 1))
> +    {

'l' and 'stats'  is already set above to be consistent with pipe, that should be used here?
 
> +      /* free allocated data and return */
> +      if (new != new0)
> +        rte_free(new);
> +
> +      return;
> +    }
> +  }
> +
>    if (new)
>      stats->exp_updates_accepted++;
>    else
> @@ -426,6 +448,7 @@ static void
>  rte_recalculate(rtable *table, net *net, struct proto *p, struct proto *src, rte *new, ea_list *tmpa)
>  {
>    struct proto_stats *stats = &p->stats;
> +  struct proto_limit *l;
>    rte *old_best = net->routes;
>    rte *old = NULL;
>    rte **k, *r, *s;
> @@ -486,6 +509,16 @@ rte_recalculate(rtable *table, net *net, struct proto *p, struct proto *src, rte
>        return;
>      }
>  
> +  /* Check limit for imported routes */
> +  if (new && !old)
> +  {
> +    if (p->flags & PFLAG_ILIMIT_BLOCK)
> +      return;
> +
> +    if ((l = p->in_limit) && (p->stats.imp_routes + 1 > l->limit) && (proto_notify_limit(p, l, table) == 1))
> +      return;
> +  }

Ignore this check in case of pipe?

> +
>    if (new)
>      stats->imp_updates_accepted++;
>    else
> @@ -1233,7 +1266,11 @@ rt_feed_baby(struct proto *p)
>  {
>    struct announce_hook *h;
>    struct fib_iterator *fit;
> -  int max_feed = 256;
> +  struct proto_limit *l = p->out_limit;
> +  int max_feed = 256, need_check, do_check;

Isn't code here in rt_feed_baby() unnecessary?
do_feed_baby() is called and do_rte_announce() is called,
where limits are already handled. Perhaps just some check
if block is reached to skip rest of feeding?

Also pipes seems not to be handled properly here.

> +  /* Do we need to filter route updates? */
> +  need_check = (!(p->flags & PFLAG_ELIMIT_BLOCK) && l) ? 1 : 0;
>  
>    if (!p->feed_ahook)			/* Need to initialize first */
>      {
> @@ -1248,6 +1285,8 @@ rt_feed_baby(struct proto *p)
>  
>  again:
>    h = p->feed_ahook;
> +  /* Do limit check for base protocol table only */
> +  do_check = ((p->table == h->table) && need_check)? 1 : 0;
>    FIB_ITERATE_START(&h->table->fib, fit, fn)
>      {
>        net *n = (net *) fn;
> @@ -1263,6 +1302,12 @@ again:
>  	  {
>  	    if (p->core_state != FS_FEEDING)
>  	      return 1;  /* In the meantime, the protocol fell down. */
> +	    if (do_check && (p->stats.exp_routes + 1 > l->limit))
> +	      if (proto_notify_limit(p, l, h->table))
> +	      {
> +		/* limit action forbids new exports, end feed */
> +		break;
> +	      }
>  	    do_feed_baby(p, RA_OPTIMAL, h, n, e);
>  	    max_feed--;
>  	  }
> diff --git a/proto/bgp/bgp.c b/proto/bgp/bgp.c
> index 675342d..483a2d0 100644
> --- a/proto/bgp/bgp.c
> +++ b/proto/bgp/bgp.c
> @@ -542,19 +542,27 @@ bgp_active(struct bgp_proto *p)
>    bgp_start_timer(conn->connect_retry_timer, delay);
>  }
>  
> -int
> -bgp_apply_limits(struct bgp_proto *p)
> +static int
> +bgp_limit_notify(struct proto *P, struct proto_limit *l, struct rtable *table)
>  {
> -  if (p->cf->route_limit && (p->p.stats.imp_routes > p->cf->route_limit))
> +  struct bgp_proto *p = (struct bgp_proto *) P;
> +  if (l->direction != PL_IMPORT)
> +    return PL_DEFAULT;

Perhaps there should be proper error code even for import limit?

> @@ -1141,9 +1150,6 @@ bgp_show_proto_info(struct proto *P)
>  	      p->rs_client ? " route-server" : "",
>  	      p->as4_session ? " AS4" : "");
>        cli_msg(-1006, "    Source address:   %I", p->source_addr);
> -      if (p->cf->route_limit)
> -	cli_msg(-1006, "    Route limit:      %d/%d",
> -		p->p.stats.imp_routes, p->cf->route_limit);

Perhaps this should be kept compatible until next major release?


BTW, which of your addresses should i use when sending/replying e-mail
to you? Or keep both ones?

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20111121/30fe663e/attachment-0001.asc>


More information about the Bird-users mailing list