[PATCH 0/3] babel: Add support for the RTT extension
This series adds support to the Babel protocol for the RTT extension, specified by this IETF draft: https://datatracker.ietf.org/doc/html/draft-ietf-babel-rtt-extension-00 The extension adds timestamps to Babel messages and uses them to compute the RTT between two Babel peers. A configurable additional cost is added to routes from based on these RTT vales. This is useful in particular for geographically distributed networks (such as overlay networks) where a route can be short in terms of hops, but each hop covers a long distance (and thus a long RTT). The extension is not currently on the IETF standards track because the document needs to be rewritten (see earlier message from Juliusz), but it's been in babeld for a while and considered quite useful there, so I think it would be nice to have it available in Bird as well. For details on the implementation, see the individual patches. Note that this series conflicts with Daniel's patches for moving the route selection into the Bird nest. Figured I'd send them now so this can be part of the discussion of that other patch (and also Daniel is one of the current users of this extension so I expect he'd be interested in having the two be compatible). Toke Høiland-Jørgensen (3): lib/timer: Add current_time_now() function for immediate timestamp babel: Add support for the RTT extension babel: Add route metric smoothing doc/bird.sgml | 63 +++++++++++-- lib/timer.c | 13 +++ lib/timer.h | 1 + proto/babel/babel.c | 205 +++++++++++++++++++++++++++++++++++++++--- proto/babel/babel.h | 40 +++++++++ proto/babel/config.Y | 30 ++++++- proto/babel/packets.c | 126 +++++++++++++++++++++++++- 7 files changed, 452 insertions(+), 26 deletions(-) -- 2.39.1
This adds support to the Babel protocol for the RTT extension specified in draft-ietf-babel-rtt-extension. While this extension is not yet at the RFC stage, it is one of the more useful extensions to Babel[0], so it seems worth having in Bird as well. The extension adds timestamps to Hello and IHU TLVs and uses these to compute an RTT to each neighbour. An extra per-neighbour cost is then computed from the RTT based on a minimum and maximum interval and cost value specified in the configuration. The primary use case for this is improving routing in a geographically distributed tunnel-based overlay network. The implementation follows the babeld implementation when picking constants and default configuration values. It also uses the same RTT smoothing algorithm as babeld, and follows it in adding a new 'tunnel' interface type which enables RTT by default. [0] https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003932.html Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> --- doc/bird.sgml | 51 ++++++++++++++--- proto/babel/babel.c | 84 ++++++++++++++++++++++++++-- proto/babel/babel.h | 24 ++++++++ proto/babel/config.Y | 20 ++++++- proto/babel/packets.c | 126 ++++++++++++++++++++++++++++++++++++++++-- 5 files changed, 288 insertions(+), 17 deletions(-) diff --git a/doc/bird.sgml b/doc/bird.sgml index 85711c31336f..451dff4031d5 100644 --- a/doc/bird.sgml +++ b/doc/bird.sgml @@ -1916,7 +1916,7 @@ protocol babel [<name>] { ipv6 [sadr] { <channel config> }; randomize router id <switch>; interface <interface pattern> { - type <wired|wireless>; + type <wired|wireless|tunnel>; rxcost <number>; limit <number>; hello interval <time>; @@ -1930,6 +1930,11 @@ protocol babel [<name>] { next hop ipv4 <address>; next hop ipv6 <address>; extended next hop <switch>; + rtt cost <number>; + rtt min <time>; + rtt max <time>; + rtt decay <number>; + send timestamps <switch>; authentication none|mac [permissive]; password "<text>"; password "<text>" { @@ -1960,15 +1965,16 @@ protocol babel [<name>] { router ID every time it starts up, which avoids this problem at the cost of not having stable router IDs in the network. Default: no. - <tag><label id="babel-type">type wired|wireless </tag> - This option specifies the interface type: Wired or wireless. On wired - interfaces a neighbor is considered unreachable after a small number of - Hello packets are lost, as described by <cf/limit/ option. On wireless + <tag><label id="babel-type">type wired|wireless|tunnel </tag> + This option specifies the interface type: Wired, wireless or tunnel. On + wired interfaces a neighbor is considered unreachable after a small number + of Hello packets are lost, as described by <cf/limit/ option. On wireless interfaces the ETX link quality estimation technique is used to compute the metrics of routes discovered over this interface. This technique will gradually degrade the metric of routes when packets are lost rather than - the more binary up/down mechanism of wired type links. Default: - <cf/wired/. + the more binary up/down mechanism of wired type links. A tunnel is like a + wired interface, but turns on RTT-based metrics with a default cost of 96. + Default: <cf/wired/. <tag><label id="babel-rxcost">rxcost <m/num/</tag> This option specifies the nominal RX cost of the interface. The effective @@ -2039,6 +2045,37 @@ protocol babel [<name>] { hop when IPv4 addresses are absent from the interface as described in <rfc id="9229">. Default: yes. + <tag><label id="babel-rtt-cost">rtt cost <m/number/</tag> + The RTT-based cost that will be applied to all routes from each neighbour + based on the measured RTT to that neighbour. If this value is set, + timestamps will be included in generated Babel Hello and IHU messages, and + (if the neighbours also have timestamps enabled), the RTT to each + neighbour will be computed. An additional cost is added to a neighbour if + its RTT is above the <ref id="babel-rtt-min" name="rtt min"> value + configured on the interface. The added cost scales linearly from 0 up to + the RTT cost configured in this option; the full cost is applied if the + neighbour RTT reaches the RTT configured in the <ref id="babel-rtt-max" + name="rtt max"> option (and for all RTTs above this value). Default: 0 + (disabled), except for tunnel interfaces, where it is 96. + + <tag><label id="babel-rtt-min">rtt min <m/time/ s|ms</tag> + The minimum RTT above which the RTT cost will start to be applied (scaling + linearly from zero up to the full cost). Default: 10 ms + + <tag><label id="babel-rtt-max">rtt max <m/time/ s|ms</tag> + The maximum RTT above which the full RTT cost will start be applied. + Default: 120 ms + + <tag><label id="babel-rtt-decay">rtt decay <m/number/</tag> + The decay factor used for the exponentional moving average of the RTT + samples from each neighbour, in units of 1/256. Higher values discards old + RTT samples faster. Must be between 1 and 256. Default: 42 + + <tag><label id="babel-send-timestamps">send timestamps <m/switch/</tag> + Whether to send the timestamps used for RTT calculation on this interface. + Sending the timestamps enables peers to calculate an RTT to this node, + even if no RTT cost is applied to the route metrics. Default: yes. + <tag><label id="babel-authentication">authentication none|mac [permissive]</tag> Selects authentication method to be used. <cf/none/ means that packets are not authenticated at all, <cf/mac/ means MAC authentication is diff --git a/proto/babel/babel.c b/proto/babel/babel.c index 9f33dd3458bd..04613788303c 100644 --- a/proto/babel/babel.c +++ b/proto/babel/babel.c @@ -596,6 +596,7 @@ babel_update_cost(struct babel_neighbor *nbr) switch (cf->type) { case BABEL_IFACE_TYPE_WIRED: + case BABEL_IFACE_TYPE_TUNNEL: /* k-out-of-j selection - Appendix 2.1 in the RFC. */ /* Link is bad if less than cf->limit/16 of expected hellos were received */ @@ -624,6 +625,24 @@ babel_update_cost(struct babel_neighbor *nbr) break; } + if (cf->rtt_cost && nbr->srtt > cf->rtt_min) + { + uint rtt_cost = cf->rtt_cost; + + if (nbr->srtt < cf->rtt_max) + { + uint rtt_interval = cf->rtt_max TO_US - cf->rtt_min TO_US; + uint rtt_diff = (nbr->srtt TO_US - cf->rtt_min TO_US); + + rtt_cost = (rtt_cost * rtt_diff) / rtt_interval; + } + + txcost = MIN(txcost + rtt_cost, BABEL_INFINITY); + + TRACE(D_EVENTS, "Added RTT cost %u to nbr %I on %s with srtt %u.%03u ms", + rtt_cost, nbr->addr, nbr->ifa->iface->name, nbr->srtt/1000, nbr->srtt%1000); + } + done: /* If RX cost changed, send IHU with next Hello */ if (rxcost != nbr->rxcost) @@ -854,6 +873,12 @@ babel_build_ihu(union babel_msg *msg, struct babel_iface *ifa, struct babel_neig msg->ihu.rxcost = n->rxcost; msg->ihu.interval = ifa->cf->ihu_interval; + if (n->last_tstamp_rcvd && ifa->cf->rtt_send) + { + msg->ihu.tstamp = n->last_tstamp; + msg->ihu.tstamp_rcvd = n->last_tstamp_rcvd TO_US; + } + TRACE(D_PACKETS, "Sending IHU for %I with rxcost %d interval %t", msg->ihu.addr, msg->ihu.rxcost, (btime) msg->ihu.interval); } @@ -893,6 +918,9 @@ babel_send_hello(struct babel_iface *ifa, uint interval) msg.hello.seqno = ifa->hello_seqno++; msg.hello.interval = interval ?: ifa->cf->hello_interval; + if (ifa->cf->rtt_send) + msg.hello.tstamp = 1; /* real timestamp will be set on TLV write */ + TRACE(D_PACKETS, "Sending hello on %s with seqno %d interval %t", ifa->ifname, msg.hello.seqno, (btime) msg.hello.interval); @@ -1199,14 +1227,26 @@ babel_handle_hello(union babel_msg *m, struct babel_iface *ifa) msg->seqno, (btime) msg->interval); struct babel_neighbor *n = babel_get_neighbor(ifa, msg->sender); + struct babel_iface_config *cf = n->ifa->cf; int first_hello = !n->hello_cnt; + if (msg->tstamp) + { + n->last_tstamp = msg->tstamp; + n->last_tstamp_rcvd = msg->pkt_received; + } babel_update_hello_history(n, msg->seqno, msg->interval); babel_update_cost(n); /* Speed up session establishment by sending IHU immediately */ if (first_hello) - babel_send_ihu(ifa, n); + { + /* if using RTT, all IHUs must be paired with hellos */ + if(cf->rtt_send) + babel_send_hello(ifa, 0); + else + babel_send_ihu(ifa, n); + } } void @@ -1225,6 +1265,39 @@ babel_handle_ihu(union babel_msg *m, struct babel_iface *ifa) struct babel_neighbor *n = babel_get_neighbor(ifa, msg->sender); n->txcost = msg->rxcost; n->ihu_expiry = current_time() + BABEL_IHU_EXPIRY_FACTOR(msg->interval); + + if (msg->tstamp) + { + u32 rtt_sample = 0, pkt_received = msg->pkt_received TO_US; + int remote_time, full_time; + + /* processing time reported by peer */ + remote_time = (n->last_tstamp - msg->tstamp_rcvd); + /* time since we sent the last timestamp - RTT including remote time */ + full_time = (pkt_received - msg->tstamp); + + /* sanity checks */ + if (remote_time < 0 || full_time < 0 || + remote_time US_ > BABEL_RTT_MAX_VALUE || full_time US_ > BABEL_RTT_MAX_VALUE) + goto out; + + if (remote_time < full_time) + rtt_sample = full_time - remote_time; + + if (n->srtt) + { + uint decay = n->ifa->cf->rtt_decay; + + n->srtt = (decay * rtt_sample + (256 - decay) * n->srtt) / 256; + } + else + n->srtt = rtt_sample; + + TRACE(D_EVENTS, "RTT sample for neighbour %I on %s: %u us (srtt %u.%03u ms)", + n->addr, ifa->ifname, rtt_sample, n->srtt/1000, n->srtt%1000); + } + +out: babel_update_cost(n); } @@ -2199,8 +2272,8 @@ babel_show_neighbors(struct proto *P, const char *iff) } cli_msg(-1024, "%s:", p->p.name); - cli_msg(-1024, "%-25s %-10s %6s %6s %6s %7s %4s", - "IP address", "Interface", "Metric", "Routes", "Hellos", "Expires", "Auth"); + cli_msg(-1024, "%-25s %-10s %6s %6s %6s %7s %4s %11s", + "IP address", "Interface", "Metric", "Routes", "Hellos", "Expires", "Auth", "RTT"); WALK_LIST(ifa, p->interfaces) { @@ -2215,9 +2288,10 @@ babel_show_neighbors(struct proto *P, const char *iff) uint hellos = u32_popcount(n->hello_map); btime timer = (n->hello_expiry ?: n->init_expiry) - current_time(); - cli_msg(-1024, "%-25I %-10s %6u %6u %6u %7t %-4s", + cli_msg(-1024, "%-25I %-10s %6u %6u %6u %7t %-4s %5u.%03ums", n->addr, ifa->iface->name, n->cost, rts, hellos, MAX(timer, 0), - n->auth_passed ? "Yes" : "No"); + n->auth_passed ? "Yes" : "No", + n->srtt/1000, n->srtt%1000); } } } diff --git a/proto/babel/babel.h b/proto/babel/babel.h index dcd303e13ecd..edde4cabe6b1 100644 --- a/proto/babel/babel.h +++ b/proto/babel/babel.h @@ -53,10 +53,16 @@ #define BABEL_GARBAGE_INTERVAL (300 S_) #define BABEL_RXCOST_WIRED 96 #define BABEL_RXCOST_WIRELESS 256 +#define BABEL_RXCOST_RTT 96 #define BABEL_INITIAL_HOP_COUNT 255 #define BABEL_MAX_SEND_INTERVAL 5 /* Unused ? */ #define BABEL_INITIAL_NEIGHBOR_TIMEOUT (60 S_) +#define BABEL_RTT_MAX_VALUE (600 S_) +#define BABEL_RTT_MIN (10 MS_) +#define BABEL_RTT_MAX (120 MS_) +#define BABEL_RTT_DECAY 42 + /* Max interval that will not overflow when carried as 16-bit centiseconds */ #define BABEL_TIME_UNITS 10000 /* On-wire times are counted in centiseconds */ #define BABEL_MIN_INTERVAL (0x0001 * BABEL_TIME_UNITS) @@ -96,6 +102,8 @@ enum babel_tlv_type { enum babel_subtlv_type { BABEL_SUBTLV_PAD1 = 0, BABEL_SUBTLV_PADN = 1, + BABEL_SUBTLV_DIVERSITY = 2, /* we don't support this */ + BABEL_SUBTLV_TIMESTAMP = 3, /* Mandatory subtlvs */ BABEL_SUBTLV_SOURCE_PREFIX = 128, @@ -106,6 +114,7 @@ enum babel_iface_type { BABEL_IFACE_TYPE_UNDEF = 0, BABEL_IFACE_TYPE_WIRED = 1, BABEL_IFACE_TYPE_WIRELESS = 2, + BABEL_IFACE_TYPE_TUNNEL = 3, BABEL_IFACE_TYPE_MAX }; @@ -141,6 +150,12 @@ struct babel_iface_config { uint ihu_interval; /* IHU interval, in us */ uint update_interval; /* Update interval, in us */ + btime rtt_min; /* rtt above which to start penalising metric */ + btime rtt_max; /* max rtt metric penalty applied above this */ + u16 rtt_cost; /* metric penalty to apply at rtt_max */ + u16 rtt_decay; /* decay of neighbour RTT (units of 1/256) */ + u8 rtt_send; /* whether to send timestamps on this interface */ + u16 rx_buffer; /* RX buffer size, 0 for MTU */ u16 tx_length; /* TX packet length limit (including headers), 0 for MTU */ int tx_tos; @@ -229,6 +244,10 @@ struct babel_neighbor { u16 next_hello_seqno; uint last_hello_int; + u32 last_tstamp; + btime last_tstamp_rcvd; + btime srtt; + u32 auth_pc_unicast; u32 auth_pc_multicast; u8 auth_passed; @@ -326,6 +345,8 @@ struct babel_msg_hello { u16 seqno; uint interval; ip_addr sender; + u32 tstamp; + btime pkt_received; }; struct babel_msg_ihu { @@ -335,6 +356,9 @@ struct babel_msg_ihu { uint interval; ip_addr addr; ip_addr sender; + u32 tstamp; + u32 tstamp_rcvd; + btime pkt_received; }; struct babel_msg_update { diff --git a/proto/babel/config.Y b/proto/babel/config.Y index 1b4dc6f5f6c5..b8af02679f0c 100644 --- a/proto/babel/config.Y +++ b/proto/babel/config.Y @@ -26,7 +26,7 @@ CF_KEYWORDS(BABEL, INTERFACE, METRIC, RXCOST, HELLO, UPDATE, INTERVAL, PORT, TYPE, WIRED, WIRELESS, RX, TX, BUFFER, PRIORITY, LENGTH, CHECK, LINK, NEXT, HOP, IPV4, IPV6, BABEL_METRIC, SHOW, INTERFACES, NEIGHBORS, ENTRIES, RANDOMIZE, ROUTER, ID, AUTHENTICATION, NONE, MAC, PERMISSIVE, - EXTENDED) + EXTENDED, TUNNEL, RTT, MIN, MAX, DECAY, SEND, TIMESTAMPS) CF_GRAMMAR @@ -67,6 +67,10 @@ babel_iface_start: BABEL_IFACE->limit = BABEL_HELLO_LIMIT; BABEL_IFACE->tx_tos = IP_PREC_INTERNET_CONTROL; BABEL_IFACE->tx_priority = sk_priority_control; + BABEL_IFACE->rtt_min = BABEL_RTT_MIN; + BABEL_IFACE->rtt_max = BABEL_RTT_MAX; + BABEL_IFACE->rtt_decay = BABEL_RTT_DECAY; + BABEL_IFACE->rtt_send = 1; BABEL_IFACE->check_link = 1; BABEL_IFACE->ext_next_hop = 1; }; @@ -87,8 +91,16 @@ babel_iface_finish: BABEL_IFACE->hello_interval = BABEL_HELLO_INTERVAL_WIRED; if (!BABEL_IFACE->rxcost) BABEL_IFACE->rxcost = BABEL_RXCOST_WIRED; + if (BABEL_IFACE->type == BABEL_IFACE_TYPE_TUNNEL && !BABEL_IFACE->rtt_cost) + BABEL_IFACE->rtt_cost = BABEL_RXCOST_RTT; } + if (BABEL_IFACE->rtt_cost && !BABEL_IFACE->rtt_send) + cf_error("Can't set RTT cost when sending timestamps is disabled"); + + if (BABEL_IFACE->rtt_min >= BABEL_IFACE->rtt_max) + cf_error("Min RTT must be smaller than max RTT"); + /* Make sure we do not overflow the 16-bit centisec fields */ if (!BABEL_IFACE->update_interval) BABEL_IFACE->update_interval = MIN_(BABEL_IFACE->hello_interval*BABEL_UPDATE_INTERVAL_FACTOR, BABEL_MAX_INTERVAL); @@ -136,6 +148,7 @@ babel_iface_item: | LIMIT expr { BABEL_IFACE->limit = $2; if (($2<1) || ($2>16)) cf_error("Limit must be in range 1-16"); } | TYPE WIRED { BABEL_IFACE->type = BABEL_IFACE_TYPE_WIRED; } | TYPE WIRELESS { BABEL_IFACE->type = BABEL_IFACE_TYPE_WIRELESS; } + | TYPE TUNNEL { BABEL_IFACE->type = BABEL_IFACE_TYPE_TUNNEL; } | HELLO INTERVAL expr_us { BABEL_IFACE->hello_interval = $3; if (($3<BABEL_MIN_INTERVAL) || ($3>BABEL_MAX_INTERVAL)) cf_error("Hello interval must be in range 10 ms - 655 s"); } | UPDATE INTERVAL expr_us { BABEL_IFACE->update_interval = $3; if (($3<BABEL_MIN_INTERVAL) || ($3>BABEL_MAX_INTERVAL)) cf_error("Update interval must be in range 10 ms - 655 s"); } | RX BUFFER expr { BABEL_IFACE->rx_buffer = $3; if (($3<256) || ($3>65535)) cf_error("RX buffer must be in range 256-65535"); } @@ -149,6 +162,11 @@ babel_iface_item: | AUTHENTICATION NONE { BABEL_IFACE->auth_type = BABEL_AUTH_NONE; } | AUTHENTICATION MAC { BABEL_IFACE->auth_type = BABEL_AUTH_MAC; BABEL_IFACE->auth_permissive = 0; } | AUTHENTICATION MAC PERMISSIVE { BABEL_IFACE->auth_type = BABEL_AUTH_MAC; BABEL_IFACE->auth_permissive = 1; } + | RTT MIN expr_us { BABEL_IFACE->rtt_min = $3; } + | RTT MAX expr_us { BABEL_IFACE->rtt_max = $3; } + | RTT COST expr { BABEL_IFACE->rtt_cost = $3; if ($3 >= BABEL_INFINITY) cf_error("RTT cost must be < 65535"); } + | RTT DECAY expr { BABEL_IFACE->rtt_decay = $3; if (($3 < 1) || ($3 > 256)) cf_error("RTT decay must be between 1-256"); } + | SEND TIMESTAMPS bool { BABEL_IFACE->rtt_send = $3; } | password_list ; diff --git a/proto/babel/packets.c b/proto/babel/packets.c index 61c94cc5133e..f18956558190 100644 --- a/proto/babel/packets.c +++ b/proto/babel/packets.c @@ -58,6 +58,13 @@ struct babel_tlv_ihu { u8 addr[0]; } PACKED; +struct babel_subtlv_timestamp { + u8 type; + u8 length; + u32 tstamp; + u32 tstamp_rcvd; /* only used in IHU */ +} PACKED; + struct babel_tlv_router_id { u8 type; u8 length; @@ -161,6 +168,7 @@ struct babel_parse_state { const struct babel_tlv_data* (*get_subtlv_data)(u8 type); struct babel_proto *proto; struct babel_iface *ifa; + btime received_time; ip_addr saddr; ip_addr next_hop_ip4; ip_addr next_hop_ip6; @@ -172,6 +180,7 @@ struct babel_parse_state { u8 def_ip6_prefix_seen; /* def_ip6_prefix is valid */ u8 def_ip4_prefix_seen; /* def_ip4_prefix is valid */ u8 def_ip4_via_ip6_prefix_seen; /* def_ip4_via_ip6_prefix is valid */ + u8 hello_tstamp_seen; /* pkt contains a hello timestamp */ u8 current_tlv_endpos; /* End of self-terminating TLVs (offset from start) */ u8 sadr_enabled; u8 is_unicast; @@ -336,6 +345,7 @@ static int babel_read_update(struct babel_tlv *hdr, union babel_msg *msg, struct static int babel_read_route_request(struct babel_tlv *hdr, union babel_msg *msg, struct babel_parse_state *state); static int babel_read_seqno_request(struct babel_tlv *hdr, union babel_msg *msg, struct babel_parse_state *state); static int babel_read_source_prefix(struct babel_tlv *hdr, union babel_msg *msg, struct babel_parse_state *state); +static int babel_read_timestamp(struct babel_tlv *hdr, union babel_msg *msg, struct babel_parse_state *state); static uint babel_write_ack(struct babel_tlv *hdr, union babel_msg *msg, struct babel_write_state *state, uint max_len); static uint babel_write_hello(struct babel_tlv *hdr, union babel_msg *msg, struct babel_write_state *state, uint max_len); @@ -344,6 +354,7 @@ static uint babel_write_update(struct babel_tlv *hdr, union babel_msg *msg, stru static uint babel_write_route_request(struct babel_tlv *hdr, union babel_msg *msg, struct babel_write_state *state, uint max_len); static uint babel_write_seqno_request(struct babel_tlv *hdr, union babel_msg *msg, struct babel_write_state *state, uint max_len); static int babel_write_source_prefix(struct babel_tlv *hdr, net_addr *net, uint max_len); +static int babel_write_timestamp(struct babel_tlv *hdr, u32 tstamp, u32 tstamp_rcvd, uint max_len); static const struct babel_tlv_data tlv_data[BABEL_TLV_MAX] = { [BABEL_TLV_ACK_REQ] = { @@ -419,6 +430,13 @@ static const struct babel_tlv_data *get_packet_tlv_data(u8 type) return type < sizeof(tlv_data) / sizeof(*tlv_data) ? &tlv_data[type] : NULL; } +static const struct babel_tlv_data timestamp_tlv_data = { + sizeof(struct babel_subtlv_timestamp), + babel_read_timestamp, + NULL, + NULL +}; + static const struct babel_tlv_data source_prefix_tlv_data = { sizeof(struct babel_subtlv_source_prefix), babel_read_source_prefix, @@ -430,6 +448,8 @@ static const struct babel_tlv_data *get_packet_subtlv_data(u8 type) { switch (type) { + case BABEL_SUBTLV_TIMESTAMP: + return ×tamp_tlv_data; case BABEL_SUBTLV_SOURCE_PREFIX: return &source_prefix_tlv_data; @@ -491,16 +511,34 @@ babel_read_hello(struct babel_tlv *hdr, union babel_msg *m, static uint babel_write_hello(struct babel_tlv *hdr, union babel_msg *m, - struct babel_write_state *state UNUSED, uint max_len UNUSED) + struct babel_write_state *state UNUSED, uint max_len) { struct babel_tlv_hello *tlv = (void *) hdr; struct babel_msg_hello *msg = &m->hello; + uint len = sizeof(struct babel_tlv_hello); TLV_HDR0(tlv, BABEL_TLV_HELLO); put_u16(&tlv->seqno, msg->seqno); put_time16(&tlv->interval, msg->interval); - return sizeof(struct babel_tlv_hello); + if (msg->tstamp) + { + /* + * There can be a substantial delay between when the babel_msg was created + * and when it is serialised. We don't want this included in the RTT + * measurement, so replace the timestamp with the current time to get as + * close as possible to on-wire time for the packet. + */ + u32 tstamp = current_time_now() TO_US; + + int l = babel_write_timestamp(hdr, tstamp, 0, max_len); + if (l < 0) + return 0; + + len += l; + } + + return len; } static int @@ -565,6 +603,7 @@ babel_write_ihu(struct babel_tlv *hdr, union babel_msg *m, { struct babel_tlv_ihu *tlv = (void *) hdr; struct babel_msg_ihu *msg = &m->ihu; + uint len = sizeof(*tlv); if (ipa_is_link_local(msg->addr) && max_len < sizeof(struct babel_tlv_ihu) + 8) return 0; @@ -576,12 +615,24 @@ babel_write_ihu(struct babel_tlv *hdr, union babel_msg *m, if (!ipa_is_link_local(msg->addr)) { tlv->ae = BABEL_AE_WILDCARD; - return sizeof(struct babel_tlv_ihu); + goto out; } put_ip6_ll(&tlv->addr, msg->addr); tlv->ae = BABEL_AE_IP6_LL; hdr->length += 8; - return sizeof(struct babel_tlv_ihu) + 8; + len += 8; + +out: + if (msg->tstamp) + { + int l = babel_write_timestamp(hdr, msg->tstamp, msg->tstamp_rcvd, max_len); + if (l < 0) + return 0; + + len += l; + } + + return len; } static int @@ -1249,6 +1300,66 @@ babel_write_source_prefix(struct babel_tlv *hdr, net_addr *n, uint max_len) return len; } +static int +babel_read_timestamp(struct babel_tlv *hdr, union babel_msg *msg, + struct babel_parse_state *state) +{ + struct babel_subtlv_timestamp *tlv = (void *) hdr; + + switch (msg->type) + { + case BABEL_TLV_HELLO: + if (tlv->length < 4) + return PARSE_ERROR; + + msg->hello.tstamp = get_u32(&tlv->tstamp); + msg->hello.pkt_received = state->received_time; + state->hello_tstamp_seen = 1; + break; + + case BABEL_TLV_IHU: + if (tlv->length < 8) + return PARSE_ERROR; + + /* RTT calculation relies on a Hello always being present with an IHU */ + if (!state->hello_tstamp_seen) + break; + + msg->ihu.tstamp = get_u32(&tlv->tstamp); + msg->ihu.tstamp_rcvd = get_u32(&tlv->tstamp_rcvd); + msg->ihu.pkt_received = state->received_time; + break; + + default: + return PARSE_ERROR; + } + + return PARSE_SUCCESS; +} + +static int +babel_write_timestamp(struct babel_tlv *hdr, u32 tstamp, u32 tstamp_rcvd, uint max_len) +{ + struct babel_subtlv_timestamp *tlv = (void *) NEXT_TLV(hdr); + uint len = sizeof(*tlv); + + if (hdr->type == BABEL_TLV_HELLO) + len -= 4; + + if (len > max_len) + return -1; + + TLV_HDR(tlv, BABEL_SUBTLV_TIMESTAMP, len); + hdr->length += len; + + put_u32(&tlv->tstamp, tstamp); + + if (hdr->type == BABEL_TLV_IHU) + put_u32(&tlv->tstamp_rcvd, tstamp_rcvd); + + return len; +} + static inline int babel_read_subtlvs(struct babel_tlv *hdr, union babel_msg *msg, @@ -1518,6 +1629,13 @@ babel_process_packet(struct babel_iface *ifa, .saddr = saddr, .next_hop_ip6 = saddr, .sadr_enabled = babel_sadr_enabled(p), + + /* + * The core updates current_time() after returning from poll(), so this is + * actually the time the packet was received, even though there may have + * been a bit of delay before we got to process it + */ + .received_time = current_time(), }; if ((pkt->magic != BABEL_MAGIC) || (pkt->version != BABEL_VERSION)) -- 2.39.1
Hi, On Sun, Feb 26, 2023 at 11:10:03PM +0100, Toke Høiland-Jørgensen via Bird-users wrote:
Note that this series conflicts with Daniel's patches for moving the route selection into the Bird nest. Figured I'd send them now so this can be part of the discussion of that other patch (and also Daniel is one of the current users of this extension so I expect he'd be interested in having the two be compatible).
Indeed I've been using these patches for a while, but I've had to switch to babeld due to lack of proper route filtering ;) Still consider this Tested-By: Daniel Gröber <dxld@darkboxed.org> To clarify: it's really only the metric smoothing patch that's in conflict with my patch. I would advocate for merging only the other two patches for now while we figure out how to rework the smoothing on top of my patch. I'm happy to do the rework we just need to come up with a plan for that :) --Daniel
Daniel Gröber <dxld@darkboxed.org> writes:
Hi,
On Sun, Feb 26, 2023 at 11:10:03PM +0100, Toke Høiland-Jørgensen via Bird-users wrote:
Note that this series conflicts with Daniel's patches for moving the route selection into the Bird nest. Figured I'd send them now so this can be part of the discussion of that other patch (and also Daniel is one of the current users of this extension so I expect he'd be interested in having the two be compatible).
Indeed I've been using these patches for a while, but I've had to switch to babeld due to lack of proper route filtering ;) Still consider this
Tested-By: Daniel Gröber <dxld@darkboxed.org>
To clarify: it's really only the metric smoothing patch that's in conflict with my patch. I would advocate for merging only the other two patches for now while we figure out how to rework the smoothing on top of my patch. I'm happy to do the rework we just need to come up with a plan for that :)
Hmm, I think the way to handle this is basically: - Add the smoothed metric as a new route attribute (so it's also available to filters) - Change babel_rte_better() to incorporate the smoothed metric (from the attribute) in its comparison - Change the decay logic to be timer-based instead of calculating the smoothed metric on demand That last bit is probably the biggest change. We can't really do the cached on-demand calculation of the smoothed metric if we're sticking it in an attribute. So instead, we'll have to set a periodic timer that re-announces the route with a new smoothed metric at an interval. Doing this as part of babel_expire_routes() would be the logical place, I suppose (and the interval fits with the current BABEL_SMOOTHING_STEP). I'm not sure what impact this would have in terms of runtime overhead, but I think it might actually simplify the code (no need for the babel_update_smoothed_metric()/babel_smoothed_metric() split). -Toke
Hi Toke, On Mon, Feb 27, 2023 at 12:14:23AM +0100, Toke Høiland-Jørgensen wrote:
To clarify: it's really only the metric smoothing patch that's in conflict with my patch. I would advocate for merging only the other two patches for now while we figure out how to rework the smoothing on top of my patch. I'm happy to do the rework we just need to come up with a plan for that :)
Hmm, I think the way to handle this is basically:
- Add the smoothed metric as a new route attribute (so it's also available to filters)
I think doing that is a bad idea. If we keep filters from changing this we might just be able to optimize by only announcing smooted metric updates when the resulting route would actually be better than the currently selected one. If we let filters meddle with this however that becomes impossible. Unless you were thinking of a r/o attribute?
- Change babel_rte_better() to incorporate the smoothed metric (from the attribute) in its comparison - Change the decay logic to be timer-based instead of calculating the smoothed metric on demand
That last bit is probably the biggest change. We can't really do the cached on-demand calculation of the smoothed metric if we're sticking it in an attribute.
Can you elaborate on why you think that's not possible? The way I see it there isn't much harm in the smooted metric attribute being outdated so long as we ensure we announce an update if an actually better route comes along. Suggesting this kind of seems like we're brining back the select_route function, but I think we can do it in a way that only looks at the route that changed without iterating over all of them, the nest will do that bit for us now. --Daniel
Daniel Gröber <dxld@darkboxed.org> writes:
Hi Toke,
On Mon, Feb 27, 2023 at 12:14:23AM +0100, Toke Høiland-Jørgensen wrote:
To clarify: it's really only the metric smoothing patch that's in conflict with my patch. I would advocate for merging only the other two patches for now while we figure out how to rework the smoothing on top of my patch. I'm happy to do the rework we just need to come up with a plan for that :)
Hmm, I think the way to handle this is basically:
- Add the smoothed metric as a new route attribute (so it's also available to filters)
I think doing that is a bad idea. If we keep filters from changing this we might just be able to optimize by only announcing smooted metric updates when the resulting route would actually be better than the currently selected one.
If we let filters meddle with this however that becomes impossible. Unless you were thinking of a r/o attribute?
I meant read-only. I agree that the filters shouldn't be able to modify the smoothed metric. Another question is if the smoothing should incorporate any changes to the metric that the filters do? It probably should, right?
- Change babel_rte_better() to incorporate the smoothed metric (from the attribute) in its comparison - Change the decay logic to be timer-based instead of calculating the smoothed metric on demand
That last bit is probably the biggest change. We can't really do the cached on-demand calculation of the smoothed metric if we're sticking it in an attribute.
Can you elaborate on why you think that's not possible? The way I see it there isn't much harm in the smooted metric attribute being outdated so long as we ensure we announce an update if an actually better route comes along.
The reason I say it doesn't is that the smoothing code updates the smoothed metric of not just the new route being announced, but also of all the routes it's being compared against. To do this in the nest case I guess we'd need to have this logic inside babel_rte_better(), which means we can't include the smoothed metric as an attribute. That may be OK, but I'm not actually sure how the nest comparison works (when we announce a new route, does the rte_better function get called pairwise against every other candidate with the same prefix, or just against the current best?). -Toke
Hi Toke, On Mon, Feb 27, 2023 at 12:16:01PM +0100, Toke Høiland-Jørgensen wrote:
- Add the smoothed metric as a new route attribute (so it's also available to filters)
I think doing that is a bad idea. If we keep filters from changing this we might just be able to optimize by only announcing smooted metric updates when the resulting route would actually be better than the currently selected one.
If we let filters meddle with this however that becomes impossible. Unless you were thinking of a r/o attribute?
I meant read-only. I agree that the filters shouldn't be able to modify the smoothed metric.
I've thought about this some more, I think we absolutely shouldn't expose the smooted metric to filters. It's an implementation detail. There's a bunch of other valid ways to implement this sort of dampening/hysteresis no reason to expose it to filter users really. How would you even use this for any practical filtering anyway? Reject routes based on `smoothed_metric - metric > threshold` or even better, increase metric based on smoothed_metric difference (oscillations galore haha). Hardly seems useful to me :) AFAICT it's perfectly fine to have EAs that filters can't access though, that's the situation we're currently in for the router-id after all so that's not really a blocker. I've been playing with rebasing your smoothing patch on mine now and I see a couple of options. My favorite so far is strategically replacing relevant babel_announce_rte calls with this: static void babel_sched_announce_rte(struct babel_proto *p, struct babel_route *r) { struct babel_config *cf = (void *) p->p.cf; if (!cf->metric_decay || !e->selected || e->selected->metric < r->metric) /* When best route's metric is less than the one we're about to announce * it's safe to do so immediately even if metric smoothing is on. Note * e->selected takes export filters into account. */ babel_announce_rte(p, r->e, r); else e->scheduled = r; } The clever bit is that announcing a route that's no better than the selected one will have no impact on route selection. Same applies if another protocol's route is active (i.e. when e->selected is NULL). Only think I'm still missing is a way to reframe the metric smoothing math such that I can install a timer for when the next scheduled route update should go through, though perhaps polling in the global timer hook is fine?
Another question is if the smoothing should incorporate any changes to the metric that the filters do? It probably should, right?
I don't think that'll give us very useful behaviour TBH. Perhaps the way to look at it is that we're not trying to smooth the (administrative) metric but rather only the link cost component which is the bit that is actually fluctuating? Problem is we only have these as seperate numbers for our local metrics/costs not the ones from other nodes. Maybe we need a protocol extension to seperate these two concepts, hmm. I have been thinking it would be nice to have seperate rtt and administrative metric numbers just for network debugging, perhaps it would make sense for general operation too.
That may be OK, but I'm not actually sure how the nest comparison works (when we announce a new route, does the rte_better function get called pairwise against every other candidate with the same prefix, or just against the current best?).
From adding some quick tracing code it certainly looks like the comparison is only against the current best. There's only ever a couple of rte_better calls per prefix once things are initialized.
Also see above for how "lovely" all the necessary lookups are going to be if we go down this path, haha. --Daniel
Daniel Gröber <dxld@darkboxed.org> writes:
Hi Toke,
On Mon, Feb 27, 2023 at 12:16:01PM +0100, Toke Høiland-Jørgensen wrote:
- Add the smoothed metric as a new route attribute (so it's also available to filters)
I think doing that is a bad idea. If we keep filters from changing this we might just be able to optimize by only announcing smooted metric updates when the resulting route would actually be better than the currently selected one.
If we let filters meddle with this however that becomes impossible. Unless you were thinking of a r/o attribute?
I meant read-only. I agree that the filters shouldn't be able to modify the smoothed metric.
I've thought about this some more, I think we absolutely shouldn't expose the smooted metric to filters. It's an implementation detail. There's a bunch of other valid ways to implement this sort of dampening/hysteresis no reason to expose it to filter users really.
How would you even use this for any practical filtering anyway? Reject routes based on `smoothed_metric - metric > threshold` or even better, increase metric based on smoothed_metric difference (oscillations galore haha). Hardly seems useful to me :)
AFAICT it's perfectly fine to have EAs that filters can't access though, that's the situation we're currently in for the router-id after all so that's not really a blocker.
My thinking was that filters may want to do something like: if (metric == smoothed_metric) metric += 100; /* route is stable, we can apply our policy */ but I honestly don't know if that's useful for anything in reality :)
I've been playing with rebasing your smoothing patch on mine now and I see a couple of options. My favorite so far is strategically replacing relevant babel_announce_rte calls with this:
static void babel_sched_announce_rte(struct babel_proto *p, struct babel_route *r) { struct babel_config *cf = (void *) p->p.cf;
if (!cf->metric_decay || !e->selected || e->selected->metric < r->metric) /* When best route's metric is less than the one we're about to announce * it's safe to do so immediately even if metric smoothing is on. Note * e->selected takes export filters into account. */ babel_announce_rte(p, r->e, r); else e->scheduled = r; }
The clever bit is that announcing a route that's no better than the selected one will have no impact on route selection. Same applies if another protocol's route is active (i.e. when e->selected is NULL).
Why is it better to do this pre-filtering instead of just having the smoothed metric be part of the check in babel_rte_better()?
Only think I'm still missing is a way to reframe the metric smoothing math such that I can install a timer for when the next scheduled route update should go through, though perhaps polling in the global timer hook is fine?
I think it's probably simpler to just re-announce any route that's still converging every time we go through the routing table. Bear in mind that the currently selected route can also be converging, so predicting when two routes "cross" gets complicated quickly. Simpler to just do a periodic update and redo the comparison every time this update happens.
Another question is if the smoothing should incorporate any changes to the metric that the filters do? It probably should, right?
I don't think that'll give us very useful behaviour TBH. Perhaps the way to look at it is that we're not trying to smooth the (administrative) metric but rather only the link cost component which is the bit that is actually fluctuating? Problem is we only have these as seperate numbers for our local metrics/costs not the ones from other nodes.
Maybe we need a protocol extension to seperate these two concepts, hmm. I have been thinking it would be nice to have seperate rtt and administrative metric numbers just for network debugging, perhaps it would make sense for general operation too.
I'm not sure this is a good idea from a protocol perspective. If you want a global view of the network, you should really be running a link-state protocol ;)
That may be OK, but I'm not actually sure how the nest comparison works (when we announce a new route, does the rte_better function get called pairwise against every other candidate with the same prefix, or just against the current best?).
From adding some quick tracing code it certainly looks like the comparison is only against the current best. There's only ever a couple of rte_better calls per prefix once things are initialized.
Right, that's what I thought, thanks for confirming.
Also see above for how "lovely" all the necessary lookups are going to be if we go down this path, haha.
What path? :) -Toke
Hi Toke, On Tue, Feb 28, 2023 at 12:20:22PM +0100, Toke Høiland-Jørgensen wrote:
I've thought about this some more, I think we absolutely shouldn't expose the smooted metric to filters. It's an implementation detail. There's a bunch of other valid ways to implement this sort of dampening/hysteresis no reason to expose it to filter users really.
How would you even use this for any practical filtering anyway? Reject routes based on `smoothed_metric - metric > threshold` or even better, increase metric based on smoothed_metric difference (oscillations galore haha). Hardly seems useful to me :)
AFAICT it's perfectly fine to have EAs that filters can't access though, that's the situation we're currently in for the router-id after all so that's not really a blocker.
My thinking was that filters may want to do something like:
if (metric == smoothed_metric) metric += 100; /* route is stable, we can apply our policy */
but I honestly don't know if that's useful for anything in reality :)
Hmm, I suppose that is a valid use-case. I'd rather expose a boolean flag for that rather than the smoothed metric itself.
I've been playing with rebasing your smoothing patch on mine now and I see a couple of options. My favorite so far is strategically replacing relevant babel_announce_rte calls with this:
static void babel_sched_announce_rte(struct babel_proto *p, struct babel_route *r) { struct babel_config *cf = (void *) p->p.cf;
if (!cf->metric_decay || !e->selected || e->selected->metric < r->metric) /* When best route's metric is less than the one we're about to announce * it's safe to do so immediately even if metric smoothing is on. Note * e->selected takes export filters into account. */ babel_announce_rte(p, r->e, r); else e->scheduled = r; }
The clever bit is that announcing a route that's no better than the selected one will have no impact on route selection. Same applies if another protocol's route is active (i.e. when e->selected is NULL).
Why is it better to do this pre-filtering instead of just having the smoothed metric be part of the check in babel_rte_better()?
I think you're looking at this from the wrong angle. By not having a smoothed metric attribute we have to update to begin with we save on all this timer logic that would constantly re-announce routes to core. This way the number of core announcements stays exactly the same as before and we don't really loose anything.
Only think I'm still missing is a way to reframe the metric smoothing math such that I can install a timer for when the next scheduled route update should go through, though perhaps polling in the global timer hook is fine?
I think it's probably simpler to just re-announce any route that's still converging every time we go through the routing table.
Simpler, yes, but I do want to be able to maintain internet scale routing tables through babel eventually so slashing every little bit helps :)
Bear in mind that the currently selected route can also be converging, so predicting when two routes "cross" gets complicated quickly. Simpler to just do a periodic update and redo the comparison every time this update happens.
I feel like that's an artifact specific to the "metric smoothing" approach to route dampening not a feature though. The way I see it the behaviour we really want is to delay any change in selected route for a time related to the metric difference. Think back to what the purpose of the metric smoothing is in the first place: to limit oscillations of the selected route, which this will do just as well.
Another question is if the smoothing should incorporate any changes to the metric that the filters do? It probably should, right?
I don't think that'll give us very useful behaviour TBH. Perhaps the way to look at it is that we're not trying to smooth the (administrative) metric but rather only the link cost component which is the bit that is actually fluctuating? Problem is we only have these as seperate numbers for our local metrics/costs not the ones from other nodes.
Maybe we need a protocol extension to seperate these two concepts, hmm. I have been thinking it would be nice to have seperate rtt and administrative metric numbers just for network debugging, perhaps it would make sense for general operation too.
I'm not sure this is a good idea from a protocol perspective. If you want a global view of the network, you should really be running a link-state protocol ;)
I don't agree with that. It's not as if I want per-hop information. Just a sum of RTTs along the path and a sum of administrative metric along the path rather than have those jumbled together into one number. Since babel is quite flexible in the actual metric math that would allow interesting ways of weighing each metric component rather than just having everything be linear. For debugging this would be useful as you can see that this path in front of you actually has a crazy RTT rather than someone just having fiddled with their rxcost.
Also see above for how "lovely" all the necessary lookups are going to be if we go down this path, haha.
Whoops, that was from an earlier draft. This was about what it would look like if we update the smoothed metric in rte_better. Something like this: struct babel_proto *p_new = (struct babel_proto*)new->src->proto, *p_old = (struct babel_proto*)old->src->proto; struct babel_iface *ifa_new, *ifa_old; struct babel_neighbor *nbr_new, *nbr_old; struct babel_entry *ent_new, *ent_old; struct babel_route *route_new, *route_old; ifa_new = babel_find_iface(p_new, new->attrs->nh.iface); ifa_old = babel_find_iface(p_old, old->attrs->nh.iface); nbr_new = babel_find_neighbor(ifa_new, new->attrs->from); nbr_old = babel_find_neighbor(ifa_old, old->attrs->from); ent_new = babel_find_entry(p_new, new->net->n.addr); ent_old = babel_find_entry(p_old, old->net->n.addr); route_new = babel_find_route(ent_new, nbr_new); route_old = babel_find_route(ent_old, nbr_old); babel_update_smoothed_metric(p_new, route_new); babel_update_smoothed_metric(p_old, route_old); yikes. Don't want to go down that road, got enough of these lookups in rt_notify already :) --Daniel
Hello! On 2/28/23 13:13, dxld@darkboxed.org wrote:
Hi Toke,
On Tue, Feb 28, 2023 at 12:20:22PM +0100, Toke Høiland-Jørgensen wrote:
I've thought about this some more, I think we absolutely shouldn't expose the smooted metric to filters. It's an implementation detail. There's a bunch of other valid ways to implement this sort of dampening/hysteresis no reason to expose it to filter users really.
How would you even use this for any practical filtering anyway? Reject routes based on `smoothed_metric - metric > threshold` or even better, increase metric based on smoothed_metric difference (oscillations galore haha). Hardly seems useful to me :)
AFAICT it's perfectly fine to have EAs that filters can't access though, that's the situation we're currently in for the router-id after all so that's not really a blocker.
My thinking was that filters may want to do something like:
if (metric == smoothed_metric) metric += 100; /* route is stable, we can apply our policy */
but I honestly don't know if that's useful for anything in reality :)
Hmm, I suppose that is a valid use-case. I'd rather expose a boolean flag for that rather than the smoothed metric itself.
There is also some internal discussion in the core BIRD team about the sole principle of exposing or not exposing protocol-internal attributes. One side of this discussion says that by exposing internal attributes, we lock ourselves inside that specific implementation and can't simply change it to some else where such internal attributes make no sense at all. The other side says that users should get all the possibilities to customize what they want, log these values from filters, consume all their memory and CPU and shoot their own leg if they wish. That said, expose whatever you think is useful. We haven't decided yet.
[...]
I think you're looking at this from the wrong angle. By not having a smoothed metric attribute we have to update to begin with we save on all this timer logic that would constantly re-announce routes to core. This way the number of core announcements stays exactly the same as before and we don't really loose anything.
Only think I'm still missing is a way to reframe the metric smoothing math such that I can install a timer for when the next scheduled route update should go through, though perhaps polling in the global timer hook is fine?
I think it's probably simpler to just re-announce any route that's still converging every time we go through the routing table.
Simpler, yes, but I do want to be able to maintain internet scale routing tables through babel eventually so slashing every little bit helps :)
In version 2, update of non-best route is propagated only to some protocols like pipes, add-path BGPs and alike. In version 3, this is even more smoothed as all updates of one prefix are exported asynchronously to each protocol, being notified after your Babel ends the task (event, socket, timer), dampening best route oscillation or other flaps. This way, I'm not so scared about Babel periodically updating many routes. BIRD has to withstand it. Maria
Hi Maria, Hi Toke, On Tue, Feb 28, 2023 at 02:07:06PM +0100, Maria Matejka via Bird-users wrote:
I think it's probably simpler to just re-announce any route that's still converging every time we go through the routing table.
Simpler, yes, but I do want to be able to maintain internet scale routing tables through babel eventually so slashing every little bit helps :)
In version 2, update of non-best route is propagated only to some protocols like pipes, add-path BGPs and alike.
Ok, that's good for either approach.
In version 3, this is even more smoothed as all updates of one prefix are exported asynchronously to each protocol, being notified after your Babel ends the task (event, socket, timer), dampening best route oscillation or other flaps.
I don't quite understand why this would damp oscillations? Do you mean there's explicit route flap damping support in v3 or just that this is a side-effect of the new async world? I'd like to know more about either :) If we do have actual BGP style damping in nest in v3 I'm not sure there's much point in doing essentially the same thing in our proto. At the very least that would be a good reason to keep the babel specific daming easy to remove if it's about to become superceeded by direct nest support anyway.
This way, I'm not so scared about Babel periodically updating many routes. BIRD has to withstand it.
I still think doing uneccessary work/computation is just dumb if we can avoid it :) On Tue, Feb 28, 2023 at 04:45:35PM +0100, Toke Høiland-Jørgensen wrote:
Right, sure, that's a nice property, but I'm not actually sure how what you sketched out above accomplishes that?
Don't worry I'll send an RFC patch soon if I can make it work out, just got tied up by some (mild) covid.
Simpler, yes, but I do want to be able to maintain internet scale routing tables through babel eventually so slashing every little bit helps :)
Heh, yeah, I would like to eventually be able to do that as well, but I think there are other optimisations we need to do first. For instance, walking the entire routing table every second is not going to work in the first place in this case :)
True, but might as well throw myself at this RTT stuff while I have the time and energy. Large scale route table performance testing will have to wait for another day since there's not much point making it performant if the features I want/need aren't supported (and performant! haha).
Bear in mind that the currently selected route can also be converging, so predicting when two routes "cross" gets complicated quickly. Simpler to just do a periodic update and redo the comparison every time this update happens.
I feel like that's an artifact specific to the "metric smoothing" approach to route dampening not a feature though. The way I see it the behaviour we really want is to delay any change in selected route for a time related to the metric difference.
Think back to what the purpose of the metric smoothing is in the first place: to limit oscillations of the selected route, which this will do just as well.
I'm OK with finding another solution, but I think you're going to have to explain in more detail how what you propose actually represents such a solution, then :)
Will do, I've been looking throug some network stability under dynamic routing literature to see if there's any well founded science we can apply here. Haven't really found anything good yet. The RTT paper does admit that "we lack an in-depth theoretical understanding of the performance of our algorithm, in particular of its stability." ;) There is one thing I'm unsure about: does the delay before propagating a route change to the kernel FIB actually have to depend on the metric difference to provide the network stability properties we're looking for? I think just strictly for stability a fixed delay should be fine, despite not being optimal in terms of convergence time.
I don't agree with that. It's not as if I want per-hop information. Just a sum of RTTs along the path and a sum of administrative metric along the path rather than have those jumbled together into one number.
Since babel is quite flexible in the actual metric math that would allow interesting ways of weighing each metric component rather than just having everything be linear.
It also introduces dependencies, though. I.e., with the current approach you can have a subset of the routers speak the RTT extension, and other parts of the network will just have that incorporated into the metric. Whereas if it is carried as a separate metric your entire network has to know about the extension for it to be useful.
I don't see why you couldn't do both. Incorporate the rtt (or other measures) into the metric for oblivious nodes and expose optional TLVs for ones that care about the different components.
For debugging this would be useful as you can see that this path in front of you actually has a crazy RTT rather than someone just having fiddled with their rxcost.
Meh, not convinced that the routing protocol is the right place to get such debugging information. I'd rather just monitor the actual traffic :)
I just put myself into the mind frame of "what if babel where used on the internet instead of eBGP" and how that means you'd have to convince lazy admins to run some weird additional software on their nodes or black box vendors not cooperating because they want to sell everyone their full network observability platform instead. Seems preferable to just have some more debuggability right in the protocol instead, no? If you're getting at the fact that you'd just do some passive TCP header sniffing do consider what happens when QUIC is widely deployed and that gets a whole lot harder :P
yikes. Don't want to go down that road, got enough of these lookups in rt_notify already :)
Right, but then we do need to put the smoothed metric into an attribute if it's to be used in the comparison. But maybe you can explain how that's not really need cf the above.
Right and my first attempt was doing that, before I came up with the new approach. --Daniel
dxld@darkboxed.org writes:
Hi Toke,
On Tue, Feb 28, 2023 at 12:20:22PM +0100, Toke Høiland-Jørgensen wrote:
I've thought about this some more, I think we absolutely shouldn't expose the smooted metric to filters. It's an implementation detail. There's a bunch of other valid ways to implement this sort of dampening/hysteresis no reason to expose it to filter users really.
How would you even use this for any practical filtering anyway? Reject routes based on `smoothed_metric - metric > threshold` or even better, increase metric based on smoothed_metric difference (oscillations galore haha). Hardly seems useful to me :)
AFAICT it's perfectly fine to have EAs that filters can't access though, that's the situation we're currently in for the router-id after all so that's not really a blocker.
My thinking was that filters may want to do something like:
if (metric == smoothed_metric) metric += 100; /* route is stable, we can apply our policy */
but I honestly don't know if that's useful for anything in reality :)
Hmm, I suppose that is a valid use-case. I'd rather expose a boolean flag for that rather than the smoothed metric itself.
No strong opinion on this, I just generally skew towards exposing things for people to do interesting things with :)
I've been playing with rebasing your smoothing patch on mine now and I see a couple of options. My favorite so far is strategically replacing relevant babel_announce_rte calls with this:
static void babel_sched_announce_rte(struct babel_proto *p, struct babel_route *r) { struct babel_config *cf = (void *) p->p.cf;
if (!cf->metric_decay || !e->selected || e->selected->metric < r->metric) /* When best route's metric is less than the one we're about to announce * it's safe to do so immediately even if metric smoothing is on. Note * e->selected takes export filters into account. */ babel_announce_rte(p, r->e, r); else e->scheduled = r; }
The clever bit is that announcing a route that's no better than the selected one will have no impact on route selection. Same applies if another protocol's route is active (i.e. when e->selected is NULL).
Why is it better to do this pre-filtering instead of just having the smoothed metric be part of the check in babel_rte_better()?
I think you're looking at this from the wrong angle. By not having a smoothed metric attribute we have to update to begin with we save on all this timer logic that would constantly re-announce routes to core. This way the number of core announcements stays exactly the same as before and we don't really loose anything.
Right, sure, that's a nice property, but I'm not actually sure how what you sketched out above accomplishes that?
Only think I'm still missing is a way to reframe the metric smoothing math such that I can install a timer for when the next scheduled route update should go through, though perhaps polling in the global timer hook is fine?
I think it's probably simpler to just re-announce any route that's still converging every time we go through the routing table.
Simpler, yes, but I do want to be able to maintain internet scale routing tables through babel eventually so slashing every little bit helps :)
Heh, yeah, I would like to eventually be able to do that as well, but I think there are other optimisations we need to do first. For instance, walking the entire routing table every second is not going to work in the first place in this case :)
Bear in mind that the currently selected route can also be converging, so predicting when two routes "cross" gets complicated quickly. Simpler to just do a periodic update and redo the comparison every time this update happens.
I feel like that's an artifact specific to the "metric smoothing" approach to route dampening not a feature though. The way I see it the behaviour we really want is to delay any change in selected route for a time related to the metric difference.
Think back to what the purpose of the metric smoothing is in the first place: to limit oscillations of the selected route, which this will do just as well.
I'm OK with finding another solution, but I think you're going to have to explain in more detail how what you propose actually represents such a solution, then :)
Another question is if the smoothing should incorporate any changes to the metric that the filters do? It probably should, right?
I don't think that'll give us very useful behaviour TBH. Perhaps the way to look at it is that we're not trying to smooth the (administrative) metric but rather only the link cost component which is the bit that is actually fluctuating? Problem is we only have these as seperate numbers for our local metrics/costs not the ones from other nodes.
Maybe we need a protocol extension to seperate these two concepts, hmm. I have been thinking it would be nice to have seperate rtt and administrative metric numbers just for network debugging, perhaps it would make sense for general operation too.
I'm not sure this is a good idea from a protocol perspective. If you want a global view of the network, you should really be running a link-state protocol ;)
I don't agree with that. It's not as if I want per-hop information. Just a sum of RTTs along the path and a sum of administrative metric along the path rather than have those jumbled together into one number.
Since babel is quite flexible in the actual metric math that would allow interesting ways of weighing each metric component rather than just having everything be linear.
It also introduces dependencies, though. I.e., with the current approach you can have a subset of the routers speak the RTT extension, and other parts of the network will just have that incorporated into the metric. Whereas if it is carried as a separate metric your entire network has to know about the extension for it to be useful.
For debugging this would be useful as you can see that this path in front of you actually has a crazy RTT rather than someone just having fiddled with their rxcost.
Meh, not convinced that the routing protocol is the right place to get such debugging information. I'd rather just monitor the actual traffic :)
Also see above for how "lovely" all the necessary lookups are going to be if we go down this path, haha.
Whoops, that was from an earlier draft. This was about what it would look like if we update the smoothed metric in rte_better. Something like this:
struct babel_proto *p_new = (struct babel_proto*)new->src->proto, *p_old = (struct babel_proto*)old->src->proto; struct babel_iface *ifa_new, *ifa_old; struct babel_neighbor *nbr_new, *nbr_old; struct babel_entry *ent_new, *ent_old; struct babel_route *route_new, *route_old;
ifa_new = babel_find_iface(p_new, new->attrs->nh.iface); ifa_old = babel_find_iface(p_old, old->attrs->nh.iface);
nbr_new = babel_find_neighbor(ifa_new, new->attrs->from); nbr_old = babel_find_neighbor(ifa_old, old->attrs->from);
ent_new = babel_find_entry(p_new, new->net->n.addr); ent_old = babel_find_entry(p_old, old->net->n.addr);
route_new = babel_find_route(ent_new, nbr_new); route_old = babel_find_route(ent_old, nbr_old);
babel_update_smoothed_metric(p_new, route_new); babel_update_smoothed_metric(p_old, route_old);
yikes. Don't want to go down that road, got enough of these lookups in rt_notify already :)
Right, but then we do need to put the smoothed metric into an attribute if it's to be used in the comparison. But maybe you can explain how that's not really need cf the above. -Toke
My thinking was that filters may want to do something like:
if (metric == smoothed_metric) metric += 100; /* route is stable, we can apply our policy */
but I honestly don't know if that's useful for anything in reality :)
I'm a little conflicted on this. On the one hand, it's nice to give administrators more flexibility. On the other hand, I want Babel to be as automated as possible: that's why Babel supported ETX from the outset, and that's why we developed RTT: so that Babel could adapt to a variety of real-world networks with minimal human intervention. Toke, what problem are you aiming to solve, and are we able to develop an algorithm that solves this problem without configuring policies manually? -- Juliusz
Juliusz Chroboczek <jch@irif.fr> writes:
My thinking was that filters may want to do something like:
if (metric == smoothed_metric) metric += 100; /* route is stable, we can apply our policy */
but I honestly don't know if that's useful for anything in reality :)
I'm a little conflicted on this. On the one hand, it's nice to give administrators more flexibility. On the other hand, I want Babel to be as automated as possible: that's why Babel supported ETX from the outset, and that's why we developed RTT: so that Babel could adapt to a variety of real-world networks with minimal human intervention.
Toke, what problem are you aiming to solve, and are we able to develop an algorithm that solves this problem without configuring policies manually?
Well, the immediate problem I'm trying to solve is to consolidate Daniel's patch moving the route selection to Bird core with the RTT extension patches (and associated metric smoothing). I don't really have a particular use case in mind for exposing the metric, as indicated by my comment above. It just occurred to me as something that *might* be useful for someone :) -Toke
I don't really have a particular use case in mind for exposing the metric, as indicated by my comment above. It just occurred to me as something that *might* be useful for someone :)
I certainly emphatise with your instinct to export as many useful knobs as possible. However, just like you, I don't see any convincing uses for exporting both metrics, and I'm a little afraid it might confuse users into micro-managing the implementation. -- Juliusz
Juliusz Chroboczek <jch@irif.fr> writes:
I don't really have a particular use case in mind for exposing the metric, as indicated by my comment above. It just occurred to me as something that *might* be useful for someone :)
I certainly emphatise with your instinct to export as many useful knobs as possible. However, just like you, I don't see any convincing uses for exporting both metrics, and I'm a little afraid it might confuse users into micro-managing the implementation.
Heh, fair point. Well I'm not insisting (providing we don't need the attribute for other reasons), so we can just leave it out until someone actually has a good use case for it :) -Toke
participants (5)
-
Daniel Gröber -
dxld@darkboxed.org -
Juliusz Chroboczek -
Maria Matejka -
Toke Høiland-Jørgensen