Hi,

this is an OVS issue, already discussed:

https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html
...
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html

Official OVS quote:
> We'd accept patches to improve OVS's routing table code.  It's not
> designed to scale to 1,800,000 routes.  We'd also take code to suppress
> the routing table code in cases where it isn't actually needed, since
> it's not always needed.  But we can't take a patch to just delete it;
> I'm sure you understand.
I tried to apply this patch at that time, but was already useless for newer versions:

https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin

Our workaround was to scale VM with 3 vCPU-s, since our average system load is 1.5 for BGP.

You can see what is happening:

[root@bgp1 ~]# top
...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                           
  654 root      10 -10 1284492   1.0g  20276 R  98.0  27.0   2513:01 ovs-vswitchd                                                                                                                      
   16 root      20   0       0      0      0 S   2.0   0.0  24:45.60 ksoftirqd/1  

[root@bgp1 ~]# ip route show
...
1.0.0.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
1.0.4.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
1.0.4.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
1.0.5.0/24 via 89.212.47.185 dev t2-v24-ha proto bird


Routes being constantly added and deleted:

[root@bgp1 ~]# ip monitor
...
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 pref medium
Deleted 68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 2.16.70.0/23 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 88.221.28.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 23.50.188.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 92.122.68.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 88.221.100.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 92.123.208.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
..... 



Regards,
saso

On 6 May 2019, at 19:30, Kees Meijs <kees@nefos.nl> wrote:

Hi list,

We're in the process of replacing Quagga with BIRD but stumble upon a
little problem.

When device scanning is on (obviously default) our testing machine
completely fills up a CPU core. The culprit isn't BIRD itself but an
Open vSwitch daemon.

After disabling the device protocol and restarting BIRD, everything goes
back to it's quiet state.

BIRD (1.6.3-2) and Open vSwitch (2.6.2~pre+git20161223-3) both were
installed as Debian stable packages.

The configuration is as simple as:

# This is a minimal configuration file, which allows the bird daemon
to start
# but will not cause anything else to happen.
#
# Please refer to the documentation in the bird-doc package or BIRD User's
# Guide on http://bird.network.cz/ for more information on configuring
BIRD and
# adding routing protocols.

# Change this into your BIRD router ID. It's a world-wide unique
identification
# of your router, usually one of router's IPv4 addresses.
router id 1.2.3.4;

# The Device protocol is not a real routing protocol. It doesn't
generate any
# routes and it only serves as a module for getting information about
network
# interfaces from the kernel.
protocol device {
}

# The Kernel protocol is not a real routing protocol. Instead of
communicating
# with other routers in the network, it performs synchronization of BIRD's
# routing tables with the OS kernel.
protocol kernel {
    metric 64;    # Use explicit kernel route metric to avoid collisions
            # with non-BIRD routes in the kernel routing table
    import none;
    export all;    # Actually insert routes into the kernel routing table
}

protocol bgp test {
    description "BGP test";
    local as REDACTED;
    neighbor 1.2.3.4 as REDACTED;
    direct;
    next hop self;
    deterministic med on;
    export none;
    import all;
}

Meanwhile log messages such as below arise:

bird: Kernel dropped some netlink messages, will resync on next scan.

For a test I deleted all existing Open vSwitch bridges and the load
dropped again. After adding an empty new bridge, the load spikes again
in an instant.

This is unexpected behaviour. Maybe it's an implementation problem in
Open vSwitch or maybe in BIRD. Anyway, it should happen I guess.

Any clues?

Thanks in advance!

Regards,
Kees