Re: [PATCH v3] babel: Rework seqno request handling

20 Dec 2022

      On Tue, Dec 20, 2022 at 05:39:19PM +0100, Toke Høiland-Jørgensen wrote:
...
Ondrej Zajicek <santiago@crfreenet.org> writes:
...
But i still think this modification is worse than keeping entries for the
short time (without accounting for replies), for these reasons:
1) For variant with the short time, if the entry in the forwarding router
expires, then originator re-send the request, forwarding router re-forward
and re-create the entry, so satisfying logic still detect the reply and
trigger an update.
Not if the update arrives between the time we expire the entry, and the
time we receive the retransmit. Which can easily happen if there's a
slight timing difference, or if we're talking to a different
implementation which has a different value for the expiry timer. So to
be reasonably sure we always do the right thing and trigger an update,
we really do need to keep the entry alive for longer. At the same time,
we can't just keep a long timeout, because then we'll suppress actual
retransmits.
OK, i agree that keeping satisfy-detection for longer than
duplicate-suppression makes sense, to avoid issues with timing
differences.

(although there is IMHO no need to keep it mixed with re-sent counter and
make it as long as full timeout of originating router)
...
...
2) This change effectively eliminates 'multiple forward check' once
sr->count > 0. It would make sense to have 'one forward per one sr->count
increase' limit. But then it is pretty much equivalent to the 'short time
variant'
It doesn't quite eliminate it: if sr->count is >0 when a second
forwarded request arrives, the object will get recycled and sr->count
will get reset to 0. So there will be a new (short) period of duplicate
suppression each time a new retransmit is performed. This would be
identical if we removed the entry entirely after the first timeout.
You are right, i missed that.
...
...
3) It makes more sense to have one interval for both duplicate detection
and reply detection.
But they serve different purposes: The duplicate suppression deals with
the case where lots of neighbours all lose a prefix at the same time and
send identical seqno requests for it. To deal with this we want a
*short* window of time where we suppress duplicates. Whereas a reply may
take longer to get to us, so we need a longer record to trigger the
update, as described above.
Reusing the expiry counter mechanism is maybe not the clearest way of
expressing this; if you prefer I can rework it to have an explicit
separate timer for duplicate suppression?
Yes, that would be much better, and would clearly communicate the purpose.
...
Something like (in babel_add_seqno_request()):
if (sr->forwarded)
{
  sr->expires = current_time() + BABEL_SEQNO_FORWARD_EXPIRY; /* longish - 10 S? */;
  sr->dup_suppress = current_time() + BABEL_SEQNO_DUPLICATE_SUPPRESS_TIME; /* 1 S */;
}
else
{
  sr->expires = current_time() + BABEL_SEQNO_REQUEST_EXPIRY; /* current  value */
}
Then we can do what you suggest and just remove the entry the first time
we hit the expiry time if sr->forwarded (and we'd change the check on
sr->count to look at sr->dup_suppress in add_seqno_request() of course).
WDYT?
That would be great.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."