On Wednesday, June 1, 2022 8:32:43 AM ADT Maria Matejka wrote:
Well, to setup a test case, we have to simulate a production environment somehow. We can either continue using our current tooling which we have to document and extend quite a lot. Or we can look somewhere else.
This is why I'm asking -- we'll keep using qemu-kvm and netns to test different cases, as well as our Python scripts to check the results. I'm asking about the orchestration, how familiar are the users with using Ansible (or any other orchestration tool).
Therefore the question may have also been – is Ansible a good and commonly used tool for BIRD deployment?
I never thought about using Ansible for something like this. We used to use Linaro LAVA but ran into scaling issues. Now we do this with a thing we wrote mostly in bash. The idea is that we have a lot of complicated setups we need to test, some with even more than 10 roles. To do this we use prebuilt root filesystem images that get cloned for each role. The test script dumps the test code for the scenario on the image and sets up a systemd service to be booted when the role boots. Then we boot each one via systemd-nspawn, qemu-kvm, or even a physical device, netbooted from the modified image. To make it super fast to set up and boot we use Ceph for the image cloning but it's not really necessary. Btrfs or overlayfs could probably be used in its place while maintaining the speed without the external dependency. Since all the images are simply directories on a single box as the roles are running, you can have something outside collecting files with test results generated on the the running roles, even while they are running. We actually use MQTT to allow the roles to sync with each other and report results. Thanks to the way the filesystems are set up it would probably be easy to use a shared bind mount so they could sync via files as well. For networking, we just throw it all on a single bridge. If roles need to be isolated, we just set up VLANs on the roles when they start up. We also set it up so that the test scripts are identical on all roles. That way we can do stuff like this: if is_role gateway ; then # do something to set it up fi sync gateway-configured if is_role client ; then # Do client thing against gateway fi -- James Oakley james@ttgi.io