On Tue, Dec 20, 2022 at 05:39:19PM +0100, Toke Høiland-Jørgensen wrote:
Ondrej Zajicek <santiago@crfreenet.org> writes:
But i still think this modification is worse than keeping entries for the short time (without accounting for replies), for these reasons:
1) For variant with the short time, if the entry in the forwarding router expires, then originator re-send the request, forwarding router re-forward and re-create the entry, so satisfying logic still detect the reply and trigger an update.
Not if the update arrives between the time we expire the entry, and the time we receive the retransmit. Which can easily happen if there's a slight timing difference, or if we're talking to a different implementation which has a different value for the expiry timer. So to be reasonably sure we always do the right thing and trigger an update, we really do need to keep the entry alive for longer. At the same time, we can't just keep a long timeout, because then we'll suppress actual retransmits.
OK, i agree that keeping satisfy-detection for longer than duplicate-suppression makes sense, to avoid issues with timing differences. (although there is IMHO no need to keep it mixed with re-sent counter and make it as long as full timeout of originating router)
2) This change effectively eliminates 'multiple forward check' once sr->count > 0. It would make sense to have 'one forward per one sr->count increase' limit. But then it is pretty much equivalent to the 'short time variant'
It doesn't quite eliminate it: if sr->count is >0 when a second forwarded request arrives, the object will get recycled and sr->count will get reset to 0. So there will be a new (short) period of duplicate suppression each time a new retransmit is performed. This would be identical if we removed the entry entirely after the first timeout.
You are right, i missed that.
3) It makes more sense to have one interval for both duplicate detection and reply detection.
But they serve different purposes: The duplicate suppression deals with the case where lots of neighbours all lose a prefix at the same time and send identical seqno requests for it. To deal with this we want a *short* window of time where we suppress duplicates. Whereas a reply may take longer to get to us, so we need a longer record to trigger the update, as described above.
Reusing the expiry counter mechanism is maybe not the clearest way of expressing this; if you prefer I can rework it to have an explicit separate timer for duplicate suppression?
Yes, that would be much better, and would clearly communicate the purpose.
Something like (in babel_add_seqno_request()):
if (sr->forwarded) { sr->expires = current_time() + BABEL_SEQNO_FORWARD_EXPIRY; /* longish - 10 S? */; sr->dup_suppress = current_time() + BABEL_SEQNO_DUPLICATE_SUPPRESS_TIME; /* 1 S */; } else { sr->expires = current_time() + BABEL_SEQNO_REQUEST_EXPIRY; /* current value */ }
Then we can do what you suggest and just remove the entry the first time we hit the expiry time if sr->forwarded (and we'd change the check on sr->count to look at sr->dup_suppress in add_seqno_request() of course).
WDYT?
That would be great. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."