MultiBird on L2 - A crazy idea for Fail Over y and Load Balancing

Sun Jan 24 11:55:51 CET 2021

OK

Em dom, 24 de jan de 2021 07:49, Alexander Zubkov <green at qrator.net>
escreveu:

> On Fri, Jan 22, 2021 at 3:24 PM Douglas Fischer
> <fischerdouglas at gmail.com> wrote:
> >
> > The big difference is the single point of failure.
> >
> > With a host doing that redirection, We will have load balancing and
> different boxes(physically different) running multiple instances.
> > But We will still suffer from single-point-of-failure.
> >
> > To solve that, We will need another layer of redundancy of that
> redirector layer.
> > And will be necessary at least two other machines. Or even more,
> depending on how strong is the need for redundancy.
> > More things to deal with...
>
> I still don't get it much. Yes, if you place one balancer, you have a
> single point of failure. But you can make backup balancers.
> So you have your pool of bird daemons on balancers or somewhere else,
> then you have several balancers that balance connections to birds and
> those are redundant, so that when one fails, the balancing function is
> moved to another.
> Balancers can be made redundant with VRRP for example and then route
> connections to the pool of birds. What is missing here for your aim?
>
> >
> >
> >
> > My idea is to join Redundancy and Load-Balancing on the same layer
> solution.
> >
> > Exemplifying the concept:
> > A scenario of an IXP with 2000 Participants, 15 Facilities been 5 of
> those designed for Computing Resources.
> > (Let's consider, just to simplify the example, just one Route-Server. To
> have two Route-Servers, just double the recipe.)
> > - Slice the 2000 Peers in 5 Resource Pools, according to the facilities
> where those peers are connected.
> > - 7 Route-Servers, been 5 of then the primary of each resource pool, and
> the other two being the secondary and tertiary failover.
> > - All the route-server with the same Route-Polices and Peers
> configuration provisioned by a central CI/CD.
> > - Adjust Heart-Beat to deal with those resource pools.
> >
> > In this scenario, on an event where a facility(in Brazil the common name
> is PIX) became isolated from the rest of the Mesh of the rest of the IXP
> Lan, those participants in that facility will still be exchanging routes
> with each other.
>
> You have several facilities in one L2 network segment? With VRRP, when
> parts of it become isolated, I think, you will get a master in each
> segment (split-brain) and participants in the isolated segment will
> have their own master balancer, and if there are also some birds from
> the pool in this facility, they will be able to exchange routes.
>
> >
> >
> >
> > Em qua., 20 de jan. de 2021 às 14:25, Alexander Zubkov <green at qrator.net>
> escreveu:
> >>
> >> Hi,
> >>
> >> Thank you for the link, I looked at the presentation. It looks like
> >> what I thought it was (and you explained it the same way) - balancing
> >> connections to multiple Bird instances. So I am still a little
> >> confused about the features you want Bird to support in relation to
> >> that. Because most of the "magic" you want - like HA on L2/L3, running
> >> on different machines etc, can still be done by other networking tools
> >> and configuration. And if you have some real case, with that you have
> >> problems - it looks quite interesting to me and I could try to help
> >> you with the configuration if you wish and maybe we could even make
> >> some useful case of it for other bird users.
> >>
> >> On Tue, Jan 19, 2021 at 10:01 PM Douglas Fischer
> >> <fischerdouglas at gmail.com> wrote:
> >> >
> >> > Vertical Scalability of Route-Servers on very large IXP is a
> challenge!
> >> > We are talking about 400-2200 peers...
> >> > https://ixpdb.euro-ix.net/en/ixpdb/ixps/?sort=participants&reverse=1&
> >> >
> >> > As already mentioned, Bird still does not deal very well with
> multi-threading(even on version 2).
> >> > So, for that, threads with more gigahertz are better de several
> threads.
> >> >
> >> > In Bird's world, the solution for that is the MultiBird.
> >> > That solution is explained here:
> >> >  ->
> https://www.euro-ix.net/media/filer_public/40/8b/408bd0bb-6835-4807-8677-0a1961bd3fba/flock-of-birds_ljtmypd.pdf
> >> >  -> https://www.youtube.com/watch?v=dwRwF7Bu8as
> >> >     In pt_BR, but I believe that if you activate the automatic
> subtitles and automatic translation to your language will be enough to
> understand.
> >> >
> >> >
> >> > What I'm proposing here is just a different method of doing multiple
> instances of Bird, with the possibility of those being on diferente boxes,
> or even different sites.
> >> >
> >> >
> >> >
> >> > Em ter., 19 de jan. de 2021 às 12:22, Alexander Zubkov <
> green at qrator.net> escreveu:
> >> >>
> >> >> But you wrote that for scaling there are load balancers to balance
> >> >> sessions among different bird instances. So VRRP + Load Balancer will
> >> >> give you what you want. You can also try to bind several birds to a
> >> >> single address in linux (probably little patchin is required to set
> >> >> socket options) and linux will balance sessions between them. You may
> >> >> also want to exchange routing information somehow between your bird
> >> >> instances, but I think it also can be solved somehow, a couple of
> >> >> route reflectors for example.
> >> >> I still do not understand what you want to see in Bird itself. I
> >> >> haven't run large IXPs, so I may be not aware of something and would
> >> >> be glad if you explained it in more detail.
> >> >>
> >> >> On Tue, Jan 19, 2021 at 3:22 PM Douglas Fischer
> >> >> <fischerdouglas at gmail.com> wrote:
> >> >> >
> >> >> > As I mentioned initially, my focus was on "large environments of
> IXPs".
> >> >> > Considering that, L3 anycast does not apply very well to that
> scenario.
> >> >> > (I don't know any IXPs that use Route-Servers outside of the
> MPLA-LAN of the IXP.)
> >> >> >
> >> >> > Using VRRP is an excellent method to provide fail-over on L2.
> >> >> > (I used it a lot on several application scenarios).
> >> >> > But it does not provide load-balancing, just fail-over.
> >> >> >
> >> >> > Considering "large environments of IXPs", and the fact that even
> on Bird 2, the multi-thread limitation is not completely solved.
> >> >> > The solution for that is Load-Balance. MultiBird does it VERY WELL.
> >> >> > But until now we(at least me) have seen only "single-host" based
> solutions, using nat/forwarding connections.
> >> >> >
> >> >> > With this suggestion, using L2 load-balancing based on
> MAC-IP-Mapping manipulations, is possible to remove the "single-host" point
> of failure.
> >> >> >
> >> >> > Em ter., 19 de jan. de 2021 às 10:48, Alexander Zubkov <
> green at qrator.net> escreveu:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> You can use VRRP or alike protocol on L2 or dynamic routing with
> >> >> >> anycast on L3 for reliability. I do not see what you want in Bird.
> >> >> >> Could you explain more?
> >> >> >>
> >> >> >> On Tue, Jan 19, 2021 at 1:26 PM Douglas Fischer
> >> >> >> <fischerdouglas at gmail.com> wrote:
> >> >> >> >
> >> >> >> > I was studying the concepts of multi-bird for large
> environments of IXPs.
> >> >> >> >
> >> >> >> > And, beyond the extra complexity that it brings to the
> environment, one of the weak points I saw was the fact that all the Bird
> instances are at the same box(vm, container, etc...).
> >> >> >> >
> >> >> >> > A friend mentioned that some tests were made with a
> LoadBalancer redirecting the post-nated connections to other boxes.
> >> >> >> > But even in that scenario, that load balancer would be a
> single-point-of-failure/bottleneck.
> >> >> >> >
> >> >> >> > So I was remembering Cisco GLBP and Heart-Beat protocol.
> >> >> >> > Those protocols inform different Mac-Addresses to the same
> IPv4/IPv6 Address, based on the source of the ARP/ND query.
> >> >> >> > Making a load-balance/fail-over based on the glue between
> layer2 and layer3.
> >> >> >> > P.S.: Several scenarios uses that concept. Corosync, Windows
> Cluster, Orale RAC, etc...
> >> >> >> >
> >> >> >> > Considering that concept, and joining it with multibird:
> >> >> >> >  Would be possible to create groups of sources and assigning
> different priorities to those groups on each instance of Bird.
> >> >> >> >  In this case, each Bird instance could run on a different box,
> or even on a different site.
> >> >> >> >
> >> >> >> > Further than that, on IXPs with a large number of participants,
> would be possible to define some affinity between that group of priority
> based for example on the facility where those participants are connected.
> >> >> >> >
> >> >> >> > I have a feeling that this would be especially useful for
> remote peering scenarios.
> >> >> >> >
> >> >> >> >
> >> >> >> > Just a crazy idea to share with colleagues.
> >> >> >> > Maybe from here, some good thing could rise.
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Douglas Fernando Fischer
> >> >> >> > Engº de Controle e Automação
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Douglas Fernando Fischer
> >> >> > Engº de Controle e Automação
> >> >
> >> >
> >> >
> >> > --
> >> > Douglas Fernando Fischer
> >> > Engº de Controle e Automação
> >
> >
> >
> > --
> > Douglas Fernando Fischer
> > Engº de Controle e Automação
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20210124/ee4efc52/attachment.htm>