<div dir="ltr">I'll take the liberty of describing the context that makes me see MRAI as IMPORTANT for IXP Route-Servers.<br>And explain why I keep bothering the BIRD team about this.<br><br>Today I have the opportunity to be involved in the operation of several ASN routers connected to several IXPs and their Route-Servers.<br><br>On some occasions we have received complaints from various consultant customers about inappropriate behavior on certain routers. Excessive CPU usage, and prolonged time to align RIB and FIB changes.<br><br>In the diagnosis of these cases, we found that the BGP process suffered from a queue in the processing of update-messages coming from the Route-Servers of large IXPs to which these routers were connected.<br><br>---<br><br>I'll try to be succinct to describe the context (this is a challenge for me):<br>Scrutinizing the BGP messages that caused this overload on these routers and correlating this with networking news and facts from the networks we had some kind of contact with, we observed that:<br>When there was an event of great impact that caused a variation in connectivity (direct or indirect) in the networks of the participants of an IXP, we found that the routers (targets of this troubleshooting) received, almost at the same time, A HUGE QUANTITY of BGP-Messages from the Route- serves the IXPs informing that "the ABCD/M route is available for the XPTO participant", and immediately after that "the ABCD/M route IS NO longer available for the XPTO participant".<br>And notifications from these same routes were reporting this in almost half of the route-servers Peers.<br><br>---<br><br>I will make a sanitized example of such an event:<br>- Router of our consulting customer connected to two large IXPs (with more than 1000 participants), one in Brazil and another in Europe (both with their Route-Servers based on Bird).<br>- Some BIG event such as: the isolation of a facility of this IXP, or the failure of an important physical link in that region/country, happens.<br>- Just a subset of routes is actually affected (say 40-50 routes)<br>- About half of the thousands of networks connected to this IXP take some time to stabilize this subset of routes on their backbones.<br>- During this time, all these participants receive notifications and resend notifications about these changes in the status of these routes to the Route-Servers of that IXP (which are BIRD-based).<br>- And in a difference of 2 seconds or less, these Route-Servers propagate changes that will not have a practical result in these routes, but generate a repetition of re-notifications, in an almost vicious cycle.<br><br>Some additional findings that we were able to make:<br>- Routers in this situation that have more torque for BGP processing (Ex.: 16 threads, 16GB of RAM.) also feel the impact of this avalanche of notifications, but manage to get out of this overload much faster.<br>- Weaker routers to process BGP (Ex.: 4 threads, 4GB of RAM.) but with good traffic forwarding capabilities (~ 1Tbps) in that same event suffer for SEVERAL MINUTES until they return to a stable condition and align the RIB and the FIB.<br>- Status changes and notifications also occur on this same subset of routes via BGP sessions with transit providers. But status changes usually only happen 2 times.<br>- Complement -> In a specific transit provider we also observed that this flood of notifications about this same subset of routes happens. We have confirmed with the operators of this network that the BGP engine of this transit provider is also Bird 2.0.x.<br><br>...<br><br>Q.: Hey Douglas, and where did you get this idea to relate this to BIRD?<br><br>Unfortunately, I was not able to do a detailed analysis in order to prove the relationship between BIRD and these BGP-Updates floods.<br>However considering:<br> - The purpose for which the MRAI was created, and the documentation from some of the Routing equipment vendors on the use of this feature.<br> - Probable feedbacks between different networks connected to the same route-server.<br> - Possibility of these feedbacks to enhance BGP notifications.<br>I believe this correlation is quite plausible.<br><br>I imagine we have colleagues here with more experience who can complement or refute this theory. I'm open to hearing suggestions.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Em seg., 14 de fev. de 2022 às 10:48, Douglas Fischer <<a href="mailto:fischerdouglas@gmail.com">fischerdouglas@gmail.com</a>> escreveu:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Passing by here to drop another ping on the MRAI - Minimum Route Advertisement Interval in the BIRD, and also on the RFD - Route-Flap-Dampening.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Em qui., 8 de jul. de 2021 às 17:18, Douglas Fischer <<a href="mailto:fischerdouglas@gmail.com" target="_blank">fischerdouglas@gmail.com</a>> escreveu:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello all.<br><br>Last weeks I'm noticing the increase the increase of BGP messages on some routers that I have access to.<br>Specially those connected to IXPs with a considerable number of participants.<br>I Checked and the BGP process has increased a bit the use of CPU Also.<br>So, I suspect that Route-Servers are receiving and forwarding Route Flaps from one or more participants...<br><br>I know that the right place to fix that would be on the source of those flaps...<br>But... Considering IXPs with one ou two thousand participants, there will always be someone flapping some routes.<br><br><div>Bird is widely used as the BGP engine on IXP route-servers.<br></div><div><div><div>And exactly because of that I came here to ask about Route Flap avoidance mechanisms on Bird.<br></div></div><div><br></div></div><div>The discussion of methods to avoid Route Flapping in BGP is very controversial.<br></div><div><div><div> - RFD - Route Flapping Dampening comes and goes.<br> - MRAI is not a consensus, but there is a lot of effort on that, including som magic algorithms define some variable interval according to conditions.</div><div> - I even saw some ultra-rigorous ideas like rate limiting the BGP messages...<br clear="all"><div><br></div><div>Before I come here to bore you with this, I searched a bit on mail list and documentation of Bird and I found the links bellow related to that:<br><br>There is any work running to deal with that route-flapping question?<br><br><a href="https://bird.network.cz/?get_doc&v=20&f=bird-7.html" target="_blank">https://bird.network.cz/ - 7.1 Future work</a></div><div><a href="http://trubka.network.cz/pipermail/bird-users/2020-August/014786.html" target="_blank">Question about BGP advertisement-interval in BIRD</a><br><a href="http://trubka.network.cz/pipermail/bird-users/2020-April/014433.html" target="_blank">[BGP] MRAI connection-based implementation review</a><br><a href="http://trubka.network.cz/pipermail/bird-users/2019-July/013589.html" target="_blank">Route Flap Dampening</a><br><a href="http://trubka.network.cz/pipermail/bird-users/2020-May/014612.html" target="_blank">[BGP] bird and RFD</a><br><br>Thank you all!<br><br></div>-- <br><div dir="ltr"><div dir="ltr">Douglas Fernando Fischer<br>Engº de Controle e Automação<br><div style="padding:0px;margin-left:0px;margin-top:0px;overflow:hidden;color:black;text-align:left;line-height:130%;font-family:"courier new",monospace"></div></div></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr">Douglas Fernando Fischer<br>Engº de Controle e Automação<br><div style="padding:0px;margin-left:0px;margin-top:0px;overflow:hidden;color:black;text-align:left;line-height:130%;font-family:"courier new",monospace"></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Douglas Fernando Fischer<br>Engº de Controle e Automação<br><div style="padding:0px;margin-left:0px;margin-top:0px;overflow:hidden;color:black;text-align:left;line-height:130%;font-family:"courier new",monospace"></div></div></div>