IPv6 BGP & kernel 4.19

Mon Mar 16 22:10:28 CET 2020

Thanks.

I found a solution which seems to be working so far, with regular Debian 
4.19 kernel, on my 2 edge routers.

I set both net.ipv6.gc_thresh and max_size to 131072, the reasoning 
behind that is to have this limit above the number of routes in the full 
view, so that gc is not triggered too often.
Once the full view is loaded I can now perform an 'ip route get' lookup 
on each and every prefix without getting a "Network is unreachable" 
error (thanks for the tip Basil), nor face a noticeable service 
disruption, and IPv6 BGP sessions have also been stable so far (ie for a 
few days).

If anyone reproduces this solution (or found another one) I would be 
glad to know.

On 16/03/2020 12:41, Baptiste Jonglez wrote:
> FYI, babeld seems to be affected by this same bug: https://github.com/jech/babeld/issues/50
> 
> The net.ipv6.route.max_size workaround is also mentioned there.
> 
> Baptiste
> 
> On 26-02-20, Basil Fillan wrote:
>> Hi,
>>
>> We've also experienced this after upgrading a few routers to Debian Buster.
>> With a kernel bisect we found that a bug was introduced in the following
>> commit:
>>
>> 3b6761d18bc11f2af2a6fc494e9026d39593f22c
>>
>> This bug was still present in master as of a few weeks ago.
>>
>> It appears entries are added to the IPv6 route cache which aren't visible
>> from "ip -6 route show cache", but are causing the route cache garbage
>> collection system to trigger extremely often (every packet?) once it exceeds
>> the value of net.ipv6.route.max_size. Our original symptom was extreme
>> forwarding jitter caused within the ip6_dst_gc function (identified by some
>> spelunking with systemtap & perf) worsening as the size of the cache
>> increased. This was due to our max_size sysctl inadvertently being set to 1
>> million. Reducing this value to the default 4096 broke IPv6 forwarding
>> entirely on our test system under affected kernels. Our documentation had
>> this sysctl marked as the maximum number of IPv6 routes, so it looks like
>> the use changed at some point.
>>
>> We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096)
>> for now, which fixed our immediate issue.
>>
>> You can reproduce this by adding more than 4096 (default value of the
>> sysctl) routes to the kernel and running "ip route get" for each of them.
>> Once the route cache is filled, the error "RTNETLINK answers: Network is
>> unreachable" will be received for each subsequent "ip route get"
>> incantation, and v6 connectivity will be interrupted.
>>
>> Thanks,
>>
>> Basil
>>
>>
>> On 26/02/2020 20:38, Clément Guivy wrote:
>>> Hi, did anyone find a solution or workaround regarding this issue?
>>> Considering a router use case.
>>> I have looked at rt6_stats, total route count is around 78k (full view),
>>> and around 4100 entries in the cache at the moment on my first router
>>> (forwarding a few Mb/s) and around 2500 entries on my second router
>>> (forwarding less than 1 Mb/s).
>>> I have reread the entire thread. At first, Alarig's research seemed to
>>> lead to a neighbor management problem, my understanding is that route
>>> cache is something else entirely - or is it related somehow?
>>>
>>>
>>> On 03/12/2019 19:29, Alarig Le Lay wrote:
>>>> We agree then, and I act as a router on all those machines.
>>>>
>>>> Le 3 décembre 2019 19:27:11 GMT+01:00, Vincent Bernat
>>>> <vincent at bernat.ch> a écrit :
>>>>
>>>>      This is the result of PMTUd. But when you are a router, you don't
>>>>      need to do that, so it's mostly a problem for end hosts.
>>>>
>>>>      On December 3, 2019 7:05:49 PM GMT+01:00, Alarig Le Lay
>>>>      <alarig at swordarmor.fr> wrote:
>>>>
>>>>          On 03/12/2019 14:16, Vincent Bernat wrote:
>>>>
>>>>              The information needs to be stored somewhere.
>>>>
>>>>
>>>>          Why has it to be stored? It’s not really my problem if
>>>> someone else has
>>>>          a non-stantard MTU and can’t do TCP-MSS or PMTUd.