Re: bgp keepalive & hold timers
Hi, Not sure if it's still relevant, but I just wanted to propose another approach. Instead of comparing configured keepalive with the hold timer received from a peer, it might be more straightforward to add another knob, e.g. "minimum-hold-timer". (as on Juniper https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/st...; Nokia also has similar knob) Reason for this is that (the following is just my opinion) 'keepalive' only configures an interval for keepalive messages. And if we start using the value of keepalive timer to enforce some restrictions on peer's hold timer, this may be confusing. In other words, "minimum-hold-timer" would more clearly indicate our restrictions on hold timers. If I see "minimum-hold-timer 30" in the configuration, I understand that this device has some limitations. If I see "keepalive 10", it doesn't tell me anything about restrictions, only that this device will send keepalives every 10 secs. P.S. I haven't dealt with mailing lists before, so forgive me if I reply to the wrong thread. This is the thread I;m replying to: https://bird.network.cz/pipermail/bird-users/2022-April/016071.html
Hi, I think this "knob" is somewhat orthogonal. Because this limits the possible values, but does not sets them. Currently you can set keepalive and hold timers and BGP peers choose the minimum hold timer among them. This setting is needed if you do not want you peer to set some low hold timer for you. But the original problem has other root. If you do not set keepalive timer - it is calculated from negotiated hold timer, but if you set it manually than it is fixed and if peer decides to use lower hold timer, that can break the mechanics silently. The easiest option to "workaround" it - is not setting keepalive timer by hand when you do not control the peer. Than it's value will always be lower than the hold timer (1/3 of it). Of course if you apply the patch, than you can use the keepalive timer to indirectly set the lower bound for the hold timer. But that looks weird. So if there is a demand for "minimum-hold-timer" option, it is useful by itself, IMHO. On Wed, Nov 9, 2022 at 11:25 AM Serge G via Bird-users <bird-users@trubka.network.cz> wrote:
Hi,
Not sure if it's still relevant, but I just wanted to propose another approach. Instead of comparing configured keepalive with the hold timer received from a peer, it might be more straightforward to add another knob, e.g. "minimum-hold-timer". (as on Juniper https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/st...; Nokia also has similar knob)
Reason for this is that (the following is just my opinion) 'keepalive' only configures an interval for keepalive messages. And if we start using the value of keepalive timer to enforce some restrictions on peer's hold timer, this may be confusing.
In other words, "minimum-hold-timer" would more clearly indicate our restrictions on hold timers. If I see "minimum-hold-timer 30" in the configuration, I understand that this device has some limitations. If I see "keepalive 10", it doesn't tell me anything about restrictions, only that this device will send keepalives every 10 secs.
P.S. I haven't dealt with mailing lists before, so forgive me if I reply to the wrong thread. This is the thread I;m replying to: https://bird.network.cz/pipermail/bird-users/2022-April/016071.html
Hi, Thank you for your reply. I believe I understand the original problem and how to overcome it very well. But my point is different. In the original thread where you discussed this issue with Ondrej, there was the following proposal:
"For example bird can check that the hold timer proposed by a peer is at least 2 times greater than the local keepalive timer as in the attached patch" "I think that rejecting session when hold timer is smaller than local keepalive (so it is clear misconfiguration), and just a warning when it is smaller than 2*keepalive would be a good approach." (https://bird.network.cz/pipermail/bird-users/2022-April/016070.html) And my point regarding minimum-hold-timer addresses exactly this. IMHO, it doesn't really make sense to compare local keepalive with remote holdtime and then reject the session for some reason. If you want to limit the value of holdtime that your router accepts, it makes more sense to set minimum-hold-timer for that, not keepalive.
For example, you could configure: hold 30; keepalive 10; minimum-hold-timer 30; And this won't let your peer set hold timer lower than 30. And this makes the behavior pretty straightforward. Otherwise, with approach like "hold timer proposed by a peer is at least 2 times greater than the local keepalive timer", you end up with some obscure logic, so that people would have to go read the source code to understand why the session is not establishing. ср, 9 нояб. 2022 г. в 13:34, Alexander Zubkov <green@qrator.net>:
Hi,
I think this "knob" is somewhat orthogonal. Because this limits the possible values, but does not sets them. Currently you can set keepalive and hold timers and BGP peers choose the minimum hold timer among them. This setting is needed if you do not want you peer to set some low hold timer for you. But the original problem has other root. If you do not set keepalive timer - it is calculated from negotiated hold timer, but if you set it manually than it is fixed and if peer decides to use lower hold timer, that can break the mechanics silently. The easiest option to "workaround" it - is not setting keepalive timer by hand when you do not control the peer. Than it's value will always be lower than the hold timer (1/3 of it). Of course if you apply the patch, than you can use the keepalive timer to indirectly set the lower bound for the hold timer. But that looks weird. So if there is a demand for "minimum-hold-timer" option, it is useful by itself, IMHO.
On Wed, Nov 9, 2022 at 11:25 AM Serge G via Bird-users <bird-users@trubka.network.cz> wrote:
Hi,
Not sure if it's still relevant, but I just wanted to propose another
approach. Instead of comparing configured keepalive with the hold timer received from a peer, it might be more straightforward to add another knob, e.g. "minimum-hold-timer". (as on Juniper https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/st...; Nokia also has similar knob)
Reason for this is that (the following is just my opinion) 'keepalive'
only configures an interval for keepalive messages. And if we start using the value of keepalive timer to enforce some restrictions on peer's hold timer, this may be confusing.
In other words, "minimum-hold-timer" would more clearly indicate our
restrictions on hold timers. If I see "minimum-hold-timer 30" in the configuration, I understand that this device has some limitations. If I see "keepalive 10", it doesn't tell me anything about restrictions, only that this device will send keepalives every 10 secs.
P.S. I haven't dealt with mailing lists before, so forgive me if I reply
to the wrong thread. This is the thread I;m replying to: https://bird.network.cz/pipermail/bird-users/2022-April/016071.html
Yes, I think I have mostly the same point. But what you have mentioned: "hold timer proposed by a peer is at least 2 times greater than the local keepalive timer" is not a method for "controlling" the peer's minimum hold timer. It is needed for not to establish a broken session and debug why it flaps with not obvious reason, but to immediately finish it with an error that the timers are mistakenly inconsitent. On Wed, Nov 9, 2022 at 2:29 PM Serge G <sergei.goriunov@gmail.com> wrote:
Hi,
Thank you for your reply.
I believe I understand the original problem and how to overcome it very well. But my point is different.
In the original thread where you discussed this issue with Ondrej, there was the following proposal:
"For example bird can check that the hold timer proposed by a peer is at least 2 times greater than the local keepalive timer as in the attached patch" "I think that rejecting session when hold timer is smaller than local keepalive (so it is clear misconfiguration), and just a warning when it is smaller than 2*keepalive would be a good approach." (https://bird.network.cz/pipermail/bird-users/2022-April/016070.html) And my point regarding minimum-hold-timer addresses exactly this. IMHO, it doesn't really make sense to compare local keepalive with remote holdtime and then reject the session for some reason. If you want to limit the value of holdtime that your router accepts, it makes more sense to set minimum-hold-timer for that, not keepalive.
For example, you could configure: hold 30; keepalive 10; minimum-hold-timer 30; And this won't let your peer set hold timer lower than 30. And this makes the behavior pretty straightforward.
Otherwise, with approach like "hold timer proposed by a peer is at least 2 times greater than the local keepalive timer", you end up with some obscure logic, so that people would have to go read the source code to understand why the session is not establishing.
ср, 9 нояб. 2022 г. в 13:34, Alexander Zubkov <green@qrator.net>:
Hi,
I think this "knob" is somewhat orthogonal. Because this limits the possible values, but does not sets them. Currently you can set keepalive and hold timers and BGP peers choose the minimum hold timer among them. This setting is needed if you do not want you peer to set some low hold timer for you. But the original problem has other root. If you do not set keepalive timer - it is calculated from negotiated hold timer, but if you set it manually than it is fixed and if peer decides to use lower hold timer, that can break the mechanics silently. The easiest option to "workaround" it - is not setting keepalive timer by hand when you do not control the peer. Than it's value will always be lower than the hold timer (1/3 of it). Of course if you apply the patch, than you can use the keepalive timer to indirectly set the lower bound for the hold timer. But that looks weird. So if there is a demand for "minimum-hold-timer" option, it is useful by itself, IMHO.
On Wed, Nov 9, 2022 at 11:25 AM Serge G via Bird-users <bird-users@trubka.network.cz> wrote:
Hi,
Not sure if it's still relevant, but I just wanted to propose another approach. Instead of comparing configured keepalive with the hold timer received from a peer, it might be more straightforward to add another knob, e.g. "minimum-hold-timer". (as on Juniper https://www.juniper.net/documentation/us/en/software/junos/bgp/topics/ref/st...; Nokia also has similar knob)
Reason for this is that (the following is just my opinion) 'keepalive' only configures an interval for keepalive messages. And if we start using the value of keepalive timer to enforce some restrictions on peer's hold timer, this may be confusing.
In other words, "minimum-hold-timer" would more clearly indicate our restrictions on hold timers. If I see "minimum-hold-timer 30" in the configuration, I understand that this device has some limitations. If I see "keepalive 10", it doesn't tell me anything about restrictions, only that this device will send keepalives every 10 secs.
P.S. I haven't dealt with mailing lists before, so forgive me if I reply to the wrong thread. This is the thread I;m replying to: https://bird.network.cz/pipermail/bird-users/2022-April/016071.html
On Wed, Nov 09, 2022 at 12:34:12PM +0100, Alexander Zubkov via Bird-users wrote:
Hi,
I think this "knob" is somewhat orthogonal. Because this limits the possible values, but does not sets them. Currently you can set keepalive and hold timers and BGP peers choose the minimum hold timer among them. This setting is needed if you do not want you peer to set some low hold timer for you.
Hi I agree that this is an ortogonal issue. Hard limit to ensure that the session is stable w.r.t. keepalive time, and configurable limit to restrict too low hold time. I thought that i already implemented it after discussing it in april, but seems i forgot it, will fix that. The whole issue is a bit silly, as it is a result of too many knobs that sometimes does not make sense. Setting keepalive independently of hold time, when hold time is negotiated (so consistency cannot be validated in config-parse time), is just bad design. It would be better if keepalive could be defines as a fraction of negotiated hold time. -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, Nov 09, 2022 at 05:34:07PM +0100, Ondrej Zajicek via Bird-users wrote:
The whole issue is a bit silly, as it is a result of too many knobs that sometimes does not make sense. Setting keepalive independently of hold time, when hold time is negotiated (so consistency cannot be validated in config-parse time), is just bad design. It would be better if keepalive could be defines as a fraction of negotiated hold time.
Thinking about it, perhaps a better approach would be to just scale the keepalive timer based on (negotiated hold time / configured hold time). I was against this approach in the past as my reading of RFC 4271 is that the KeepaliveTime FSM attribute is initialized from the HoldTime FSM attribute at the beginning, and later the HoldTime is negotiated, but there is no mention of updating the KeepaliveTime (on the contrary, it uses phrases like 'initial value of KeepaliveTimer'). But now i checked RFC 4273, and it pretty much supposes there is keepalive scaling: Time interval (in seconds) for the KeepAlive timer established with the peer. The value of this object is calculated by this BGP speaker such that, when compared with bgpPeerHoldTime, it has the same proportion that bgpPeerKeepAliveConfigured has, compared with bgpPeerHoldTimeConfigured. Do you know whether other major BGP implementations use KeepAlive scaling, or not? -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On Wed, Nov 09, 2022 at 06:05:27PM +0100, Ondrej Zajicek via Bird-users wrote:
On Wed, Nov 09, 2022 at 05:34:07PM +0100, Ondrej Zajicek via Bird-users wrote:
The whole issue is a bit silly, as it is a result of too many knobs that sometimes does not make sense. Setting keepalive independently of hold time, when hold time is negotiated (so consistency cannot be validated in config-parse time), is just bad design. It would be better if keepalive could be defines as a fraction of negotiated hold time.
Thinking about it, perhaps a better approach would be to just scale the keepalive timer based on (negotiated hold time / configured hold time).
Hi Scaling implemented, with both 'min hold time' and 'min keepalive time' option: https://gitlab.nic.cz/labs/bird/-/commit/3859e4efc1597368df647323c5a3cc1771c... -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 09.12.22, 17:48, "Bird-users on behalf of Ondrej Zajicek via Bird-users" <bird-users-bounces@trubka.network.cz <mailto:bird-users-bounces@trubka.network.cz> on behalf of bird-users@trubka.network.cz <mailto:bird-users@trubka.network.cz>> wrote:
Hi
Scaling implemented, with both 'min hold time' and 'min keepalive time' option:
https://gitlab.nic.cz/labs/bird/-/commit/3859e4efc1597368df647323c5a3cc1771c... <https://gitlab.nic.cz/labs/bird/-/commit/3859e4efc1597368df647323c5a3cc1771cb64ca>
Hi, Thanks for implementing this! One small addition: Can we print the hold time in the reject error message in decimal instead of hex? Currently it prints e.g. "Error: Unacceptable hold time: 000f" instead of "Error: Unacceptable hold time: 15"
On Sat, Dec 10, 2022 at 02:37:20PM +0000, Johannes Moos wrote:
On 09.12.22, 17:48, "Bird-users on behalf of Ondrej Zajicek via Bird-users" <bird-users-bounces@trubka.network.cz <mailto:bird-users-bounces@trubka.network.cz> on behalf of bird-users@trubka.network.cz <mailto:bird-users@trubka.network.cz>> wrote:
Hi
Scaling implemented, with both 'min hold time' and 'min keepalive time' option:
https://gitlab.nic.cz/labs/bird/-/commit/3859e4efc1597368df647323c5a3cc1771c... <https://gitlab.nic.cz/labs/bird/-/commit/3859e4efc1597368df647323c5a3cc1771cb64ca>
Hi,
Thanks for implementing this! One small addition: Can we print the hold time in the reject error message in decimal instead of hex?
Hi Sure: https://gitlab.nic.cz/labs/bird/-/commit/937ebf2536e6b4d65f996af53a29ac550ac... -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (4)
-
Alexander Zubkov -
Johannes Moos -
Ondrej Zajicek -
Serge G