BIRD patches for IP-in-IP

Baptiste Jonglez baptiste at bitsofnetworks.org
Tue Sep 27 17:31:59 CEST 2016


Hi,

On Tue, Sep 27, 2016 at 03:09:52PM +0000, Neil Jerram wrote:
> Attached are 3 patches that my team has been using for routing through
> IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them
> useful, and start a conversation about whether they or something like them
> could be upstreamed (or perhaps if there's some better way of achieving our
> aims).
> 
> Calico [1] uses BIRD for BGP routing between the hosts in various cloud
> orchestration systems (Kubernetes, OpenStack etc.), to distribute routes to
> the pods/VMs/containers in those systems, each of which has its own IP.  If
> all the hosts are directly connected to each other, this is
> straightforward, but sometimes they are not.  For example GCE instances are
> not directly connected to each other: there is at least one router between
> them, that knows about routing GCE addresses, and to/from the Internet, and
> we cannot peer with it or otherwise tell it how to route pod/VM/container
> IPs.  So if we use GCE to create e.g. OpenStack compute hosts, with Calico
> networking, we need to do something extra to allow VM-addressed data to
> pass between the compute hosts.
> 
> One of our solutions is to use IP-in-IP; it works as shown by this diagram:
> 
>        10.65.0.3 via 10.240.0.5 dev tunl0 onlink
>        default via 10.240.0.1
>                |
>              +-|----------+                             +------------+
>              | o          |                             |            |
>              |   Host A   |         +--------+          |   Host B   |
>              |            |---------| Router |----------|            |
>              | 10.240.0.4 |         +--------+          | 10.240.0.5 |
>              |            |---.                         |            |
>              +------------+    |                        +------------+
>                ^       ^   +---v---+                                |
>  src 10.65.0.2 |       |   | tunl0 |                                |
>  dst 10.65.0.3 |       |   +-------+                                |
>                |        \      |                                    v
>          +-----------+   '----'                               +-----------+
>          |   Pod A   |      src 10.240.0.4                    |   Pod B   |
>          | 10.65.0.2 |      dst 10.240.0.5                    | 10.65.0.3 |
>          +-----------+          ------                        +-----------+
>                              src 10.65.0.2
>                              dst 10.65.0.3

Can't you just use a tunnel between Host A and Host B and run BGP on top
of this tunnel?  It would seem to be cleaner than hacking multi-hop BGP to
obtain appriopriate next-hop values, unless I am missing something.

It would look something like this:

             +-|----------+                             +------------+
             | o Host A   |                             |   Host B   |
             |            |         +--------+          |            |
             |  10.240.0.4|---------| Router |----------|10.240.0.5  |
             |            |         +--------+          |            |
             |   10.65.0.4|--.  +-------+   +-------+ .->10.65.0.5   |
             +------------+   `>| tunlA |-->| tunlB |-  +------------+
                                +-------+   +-------+


The BGP session would be established between 10.65.0.4 (IP of host A on
tunlA) and 10.65.0.5 (IP of host B on tunlB), so that the routes learnt
via BGP would be immediately correct.

Basically, it's a simple overlay network.

> The diagram shows Pod A sending a packet to Pod B, using IP addresses that
> are unknown to the 'Router' between the two hosts.  Host A has an IP-in-IP
> device, tunl0, and a route that says to use that device for data to Pod B's
> address (10.65.0.3).  When the packet has passed through that device, it
> has a new outer IP header, with src 10.240.0.4 and dst 10.240.0.5, and is
> routed again according to the routing table - so now it can successfully
> reach Host B.
> 
> So how is BIRD involved?  We statically program the local Pod route on each
> host:
> 
> On Host A: 10.65.0.2 dev <interface to Pod A>
> On Host B: 10.65.0.3 dev <interface to Pod B>
> 
> then run a BIRD BGP session between Host A and Host B to propagate those
> routes to the other host - which would normally give us:
> 
> On Host A: 10.65.0.3 via 10.240.0.5
> On Host B: 10.65.0.2 via 10.240.0.4
> 
> But we don't want those normal routes, because then the data would get lost
> at 'Router'.  So we enhance and configure BIRD as follows.
> 
> - In the export filter for protocol kernel, for the relevant routes, we set
> an attribute 'krt_tunnel = tunl0'.
> 
> - We modify BIRD, as in the attached patches, to understand that that means
> that those routes should have 'dev tunl0'.
> 
> Then instead, we get:
> 
> On Host A: 10.65.0.3 via 10.240.0.5 dev tunl0 onlink
> On Host B: 10.65.0.2 via 10.240.0.4 dev tunl0 onlink
> 
> which allows successful routing of data between the Pods.
> 
> 
> Thanks for reading this far!  I now have three questions:
> 
> 1. Does the routing approach above make sense?  (Or is there some better or
> simpler or already supported way that we could achieve the same thing?)
> 
> 2. If (1), would the BIRD team accept patches broadly on the lines of those
> that are attached?
> 
> 3. If (2), please let me know if the attached patches are already
> acceptable, or otherwise what further work is needed for them.
> 
> Many thanks,
>     Neil
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20160927/730feae4/attachment.asc>


More information about the Bird-users mailing list