BIRD patches for IP-in-IP

27 Sep 2016

      Hi BIRD users!

Attached are 3 patches that my team has been using for routing through
IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them
useful, and start a conversation about whether they or something like them
could be upstreamed (or perhaps if there's some better way of achieving our
aims).

Calico [1] uses BIRD for BGP routing between the hosts in various cloud
orchestration systems (Kubernetes, OpenStack etc.), to distribute routes to
the pods/VMs/containers in those systems, each of which has its own IP.  If
all the hosts are directly connected to each other, this is
straightforward, but sometimes they are not.  For example GCE instances are
not directly connected to each other: there is at least one router between
them, that knows about routing GCE addresses, and to/from the Internet, and
we cannot peer with it or otherwise tell it how to route pod/VM/container
IPs.  So if we use GCE to create e.g. OpenStack compute hosts, with Calico
networking, we need to do something extra to allow VM-addressed data to
pass between the compute hosts.

One of our solutions is to use IP-in-IP; it works as shown by this diagram:

       10.65.0.3 via 10.240.0.5 dev tunl0 onlink
       default via 10.240.0.1
               |
             +-|----------+                             +------------+
             | o          |                             |            |
             |   Host A   |         +--------+          |   Host B   |
             |            |---------| Router |----------|            |
             | 10.240.0.4 |         +--------+          | 10.240.0.5 |
             |            |---.                         |            |
             +------------+    |                        +------------+
               ^       ^   +---v---+                                |
 src 10.65.0.2 |       |   | tunl0 |                                |
 dst 10.65.0.3 |       |   +-------+                                |
               |        \      |                                    v
         +-----------+   '----'                               +-----------+
         |   Pod A   |      src 10.240.0.4                    |   Pod B   |
         | 10.65.0.2 |      dst 10.240.0.5                    | 10.65.0.3 |
         +-----------+          ------                        +-----------+
                             src 10.65.0.2
                             dst 10.65.0.3

The diagram shows Pod A sending a packet to Pod B, using IP addresses that
are unknown to the 'Router' between the two hosts.  Host A has an IP-in-IP
device, tunl0, and a route that says to use that device for data to Pod B's
address (10.65.0.3).  When the packet has passed through that device, it
has a new outer IP header, with src 10.240.0.4 and dst 10.240.0.5, and is
routed again according to the routing table - so now it can successfully
reach Host B.

So how is BIRD involved?  We statically program the local Pod route on each
host:

On Host A: 10.65.0.2 dev <interface to Pod A>
On Host B: 10.65.0.3 dev <interface to Pod B>

then run a BIRD BGP session between Host A and Host B to propagate those
routes to the other host - which would normally give us:

On Host A: 10.65.0.3 via 10.240.0.5
On Host B: 10.65.0.2 via 10.240.0.4

But we don't want those normal routes, because then the data would get lost
at 'Router'.  So we enhance and configure BIRD as follows.

- In the export filter for protocol kernel, for the relevant routes, we set
an attribute 'krt_tunnel = tunl0'.

- We modify BIRD, as in the attached patches, to understand that that means
that those routes should have 'dev tunl0'.

Then instead, we get:

On Host A: 10.65.0.3 via 10.240.0.5 dev tunl0 onlink
On Host B: 10.65.0.2 via 10.240.0.4 dev tunl0 onlink

which allows successful routing of data between the Pods.

Thanks for reading this far!  I now have three questions:

1. Does the routing approach above make sense?  (Or is there some better or
simpler or already supported way that we could achieve the same thing?)

2. If (1), would the BIRD team accept patches broadly on the lines of those
that are attached?

3. If (2), please let me know if the attached patches are already
acceptable, or otherwise what further work is needed for them.

Many thanks,
    Neil

Neil Jerram

Neil Jerram

Baptiste Jonglez

Neil Jerram

Ondrej Zajicek

Christian Tacke

Ondrej Zajicek

'Gustavo Ponza'

Ondrej Zajicek

'Gustavo Ponza'

tags

participants (5)