setsockopt(..., TCP_CORK, 1)
In cases where you know that you're going to write more but you want to be able to discard the user space buffer, MSG_MORE is useful. If holding onto the buffers is easy so they can then be passed to sendmsg(), then you may as well do that.
On Wed, May 15, 2024 at 09:09:47PM +0200, Erin Shepherd wrote:
> It seems absent from the BSDs, but on Linux you can pass the MSG_MORE
> flag to send() to override TCP_NODELAY for a specific write
Am I understanding correctly this is a variant on TCP_NOPUSH/TCP_CORK?
"more data is coming, dont push the send button yet!"
In OpenBGPD, TCP_NODELAY is set on the socket (a socket option available
on all platforms, I think?), and then all data is coalesced into
sendmsg(), no need for 'corking'. From my limited testing it seems a
full routing table should fit in ~ TCP 41,000 packets.
BIRD has a code path sk_sendmsg()->sendmsg() called from
sk_maybe_write(); but based my limited testing I'm not sure this path is
followed in all cases, because I see way more than 41K packets for a
full table feed (with TCP_NODELAY enabled).
Perhaps there are two separate questions here:
- are BGP messages (slightly) delayed because of TCP_NODELAY not being
set? (I think yes)
- are BGP messages as efficiently coalesced into as few TCP packets as
possible? (with TCP_NODELAY set, I am not sure)
Kind regards,
Job
ps. To clarify why I started this thread: last week I fell into the TCP
subsystem rabbit hole: why are things the way they are? I started
auditing various programs related to my $dayjob and thought it would be
good to open a conversation with the BIRD developer community. My goal
is not necessarily to get this patch 'as-is' merged, but to learn from
and with friendly and respected BGP developers.