Re: BIRD Testing With Ansible?

1 Jun 2022

      On Wednesday, June 1, 2022 8:32:43 AM ADT Maria Matejka wrote:
...
Well, to setup a test case, we have to simulate a production environment
somehow. We can either continue using our current tooling which we have
to document and extend quite a lot. Or we can look somewhere else.
This is why I'm asking -- we'll keep using qemu-kvm and netns to test
different cases, as well as our Python scripts to check the results. I'm
asking about the orchestration, how familiar are the users with using
Ansible (or any other orchestration tool).
Therefore the question may have also been – is Ansible a good and
commonly used tool for BIRD deployment?
I never thought about using Ansible for something like this. We used to use 
Linaro LAVA but ran into scaling issues.

Now we do this with a thing we wrote mostly in bash. The idea is that we have 
a lot of complicated setups we need to test, some with even more than 10 
roles. To do this we use prebuilt root filesystem images that get cloned for 
each role.

The test script dumps the test code for the scenario on the image and sets up 
a systemd service to be booted when the role boots. Then we boot each one via 
systemd-nspawn, qemu-kvm, or even a physical device, netbooted from the 
modified image.

To make it super fast to set up and boot we use Ceph for the image cloning but 
it's not really necessary. Btrfs or overlayfs could probably be used in its 
place while maintaining the speed without the external dependency.

Since all the images are simply directories on a single box as the roles are 
running, you can have something outside collecting files with test results 
generated on the the running roles, even while they are running. We actually 
use MQTT to allow the roles to sync with each other and report results. Thanks 
to the way the filesystems are set up it would probably be easy to use a 
shared bind mount so they could sync via files as well.

For networking, we just throw it all on a single bridge. If roles need to be 
isolated, we just set up VLANs on the roles when they start up.

We also set it up so that the test scripts are identical on all roles. That 
way we can do stuff like this:

    if is_role gateway ; then
        # do something to set it up
    fi

    sync gateway-configured

    if is_role client ; then
        # Do client thing against gateway
    fi

-- 
James Oakley
james@ttgi.io