@Joe, thanks for starting this thread, I was just thinking along this line of making sure bird is running.

Apollon, I would love to give your plugin a try. 

To summarize, I'm deploying Bird on 120 machines (2 nic each) + 6 routers (total of 240 ptp links in a clos configuration) and participate in a single area OSPF. Anyone know whether bird have been being deploy/tested in this use case?


thanks,

-bn
0216331C


On Mon, Apr 22, 2013 at 4:35 AM, Apollon Oikonomopoulos <apollon@skroutz.gr> wrote:
On 19:20 Mon 22 Apr     , Joe Wooller wrote:
> On a similar note, does anyone monitor the process externally, (say via nagios or the like?)
> I would be interested to see how people monitor the active process,
> and possibly if anyone monitors sessions and prefixes received, used/filtered?
>
> I haven't been able to find any thing out there that suits my needs, so with the assistance of a friend we have come up with this, still in progress though..
>
> https://github.com/dowlingw/bird-tool
>
> Cheers
> Joe

We are monitoring the process with icinga & check_mk. Check_mk has a
notion of state, so we essentially persist ("inventorize") the admin
state of all protocols (up/down) and their status (Connected, Running
etc) and if anything changes we get an alert.

I could share the check_mk plugin that does all this, if anyone is
interested.

We also monitor the prefix count directly in the Linux kernel (via
/proc/net/fib_triestat and /proc/net/rt6_stats) and use it to score our
keepalived processes higher or lower and possibly trigger a failover of
the access interface IPs if one router seems to receive significantly
less prefixes than the other for some reason.

Cheers,
Apollon