On Thu, Jun 03, 2021 at 11:19:32PM +0000, Maslanka, Pawel wrote:
Hi BIRD team!
We found a case when BMP code is trying to connect with BMP collector service with sk_open(), this causes increasing CPU utilization. To reproduce this case, you have just:
1. Server machine where BMP PDU packets will be sent, should be reachable (so it can be pinged). 2. BMP collector service itself should not be running on this server. 3. Run BIRD with enabled BMP protocol.
After that you should observe that BIRD process has significantly increased CPU utilization. This is related somehow with “BIRD socket” because when I capture network traffic on host machine (where BIRD is running), I can see massive amount of TCP packets which are exchange between BIRD host machine and BMP collector machine. At the moment socket type related with BMP connection is SK_TCP_ACTIVE. Do you have any idea what is going wrong or how BIRD socket should be properly use?
Hi After failed attempt to connect() the socket err_hook is called. In such case err_hook is called and you are supposed to close the socket and either disable the protocol, or setup some timeout to restart connect attempts. See rpki_err_hook() or bgp_sock_err(). Otherwise, BIRD socket layer would try to connect() immediately again. This part is missing from bmp_sock_err() in our bmp branch, i should fix that. It is still WiP.
I need also a tip if there is a way to get notification from BIRD socket if we lost connection with BMP collector service? One option is to
If a connection is closed regularly, then socket err_hook is called, but with err=0.p_sock_err(). In most cases the handling would be similar to an actual error (try to re-establish connection after some timeout).
Currently we have switched to BMP code provided on bmp branch from gitlab BIRD repo.
Additionally I have a question referring to enclosed code. Can I free list node and node data itself when sk_send() returns value greater or equal to 0 (>= 0), like in the below code?
WALK_LIST_DELSAFE(tx_data, tx_data_next, p->tx_queue) { ... rv = sk_send(p->sk, data_size); if (rv < 0) { return; }
mb_free(tx_data->data); rem_node((node *) tx_data); mb_free(tx_data); if (rv == 0) { return; } ...
Or I should to do that only if sk_send() return value greater than 0 (> 0) ? My goal is sending all data from list if there was only "temporary" problem with sk_send().
This looks OK. If sk_send() returns > 0, data were sent, you can free the data and continue the loop. If sk_send() returns 0, data were not sent, but they stay in sk->tbuf, so you can free the data from your tx_queue, and break the loop and wait for tx_hook to happen again. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."