Some small OSPF changes
Hello list! There are several improvements that can be applied to OSPF: 1) At the moment TTL is not explicitly set in per-interface sockets, so multicast messages are sent with default OS ttl (which is 255 on *BSD). This is clearly wrong: RFC 2328 states in A.1: To ensure that these packets will not travel multiple hops, their IP TTL must be set to 1. The same(but less strict) for RFC 5340 (A.1): As such, the multicast addresses have been chosen with link-local scope and packets sent to these addresses should have their IPv6 Hop Limit set to 1. There is also RFC 3171 which states that 3. Local Network Control Block (224.0.0/24) Addresses in the Local Network Control block are used for protocol control traffic that is not forwarded off link. so another possiblility is to implement such behavior in sk_setup_multicast() 2) At the moment bird totally ignores neighbor MTU. We should at least print warning (or error) if neighbor MTU differs. This greatly simplifies debugging procedures.
On Fri, Feb 24, 2012 at 11:30:19PM +0400, Alexander V. Chernikov wrote:
Hello list!
There are several improvements that can be applied to OSPF:
1) At the moment TTL is not explicitly set in per-interface sockets, so multicast messages are sent with default OS ttl (which is 255 on *BSD).
Hmm, you are right, default TTL for multicast seems to be 1 on Linux, so i missed that.
sk->flags = SKF_LADDR_RX; + sk->ttl = 1;
This will break vlinks (they use same sockets). As TTL for unicast can be different than TTL for multicast, the simplest fix would be to keep sk->ttl for unicast and add a ttl arg to sk_setup_multicast(). This will leave 255 TTL for unicast packets on NBMA ifaces, which is probably not a big issue. Another possibility is to have a common independent socket for vlinks, but i am not sure whether vlinks need an iface-specific socket - have to be checked what are requirements for vlink sockets. This would be probably better for implementing TTL security later.
2) At the moment bird totally ignores neighbor MTU. We should at least print warning (or error) if neighbor MTU differs. This greatly simplifies debugging procedures.
OK
+ if ((ps->iface_mtu != ifa->iface->mtu) && ((ifa->type != OSPF_IT_VLINK) || + ((ps->iface_mtu != 0) && (ifa->iface->mtu != 0))))
Isn't ps->iface_mtu still in network order here? 3) Another problem with OSPF sockets is that they depend on device routes in routing tables and depend on kernel to choose proper source address (if there are more IPv4 prefixes on one iface. In Linux, the multicast source address is specified in IP_MULTICAST_IF or IP_ADD_MEMBERSHIP, unicast source address is chosen based on dst. address. I am not sure, but if i remember correctly, this does not work on (at least some) BSD. Depending on device routes is not a big problem, but makes it a bit more fragile (and also a reason why we do not support _real_ unnumbered ptp ifaces). Some time ago i tried to use IP_PKTINFO to fix this, works for Linux but it does not work on some BSD for raw sockets (which are used for OSPF). I guess that using IP_HDRINCL would work, but it is ugly, just for IPv4 and not clearly encapsulated (reserved buffer space just on IPv4). Any BSD hints for that? -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
On 27.02.2012 10:20, Ondrej Zajicek wrote:
On Fri, Feb 24, 2012 at 11:30:19PM +0400, Alexander V. Chernikov wrote:
Hello list!
There are several improvements that can be applied to OSPF:
1) At the moment TTL is not explicitly set in per-interface sockets, so multicast messages are sent with default OS ttl (which is 255 on *BSD). Hmm, you are right, default TTL for multicast seems to be 1 on Linux, so i missed that.
sk->flags = SKF_LADDR_RX; + sk->ttl = 1;
This will break vlinks (they use same sockets). As TTL for unicast can be different than TTL for multicast, the simplest fix would be to keep sk->ttl for unicast and add a ttl arg to sk_setup_multicast(). This will leave 255 TTL for unicast packets on NBMA ifaces, which is probably not a big issue.
Another possibility is to have a common independent socket for vlinks, but i am not sure whether vlinks need an iface-specific socket - have to be checked what are requirements for vlink sockets. This would be probably better for implementing TTL security later. Vlink sockets are not bound to interface since ospf_iface_new() are called with NULL interface address (ospf.c). The only place where ospf_sk_open() is called is ospf_iface_add(). The only place where ospf_iface_add() is called is lock acquire hook at the end of ospf_iface_new(), and there is
if (ifa->type == OSPF_IT_VLINK) { ifa->voa = ospf_find_area(oa->po, ip->voa); ifa->vid = ip->vid; return; /* Don't lock, don't add sockets */ } code before that. Are you sure ospf_sk_open() is ever called by vlink code? sk_setup_multicast() is used by: RIPv2/3 (with 224.0.0.9 as multicast destination, which leads us to rfc 3171 and TTL 1), and the same link-local scope for v6 radv (the same for v6) OSPF (already discussed) Maybe we can simply add some kind of: + if (s->ttl == -1) + s->ttl = 1; + if (err = sysio_setup_multicast(s)) { log(L_ERR "sk_setup_multicast: %s: %m", err); ?
2) At the moment bird totally ignores neighbor MTU. We should at least print warning (or error) if neighbor MTU differs. This greatly simplifies debugging procedures. OK
+ if ((ps->iface_mtu != ifa->iface->mtu)&& ((ifa->type != OSPF_IT_VLINK) || + ((ps->iface_mtu != 0)&& (ifa->iface->mtu != 0)))) Isn't ps->iface_mtu still in network order here?
Ups. I've tested it with Huawei equipment which sets mtu to zero on interfaces with jumbo mtu is turned on..
3) Another problem with OSPF sockets is that they depend on device routes in routing tables and depend on kernel to choose proper source address (if there are more IPv4 prefixes on one iface.
In Linux, the multicast source address is specified in IP_MULTICAST_IF or IP_ADD_MEMBERSHIP, unicast source address is chosen based on dst. address.
I am not sure, but if i remember correctly, this does not work on (at least some) BSD.
Depending on device routes is not a big problem, but makes it a bit more fragile (and also a reason why we do not support _real_ unnumbered ptp ifaces).
Sorry, can't get the idea (I'm mostly unaware about multicast API). What we're trying to accomplish?
Some time ago i tried to use IP_PKTINFO to fix this, works for Linux but it does not work on some BSD for raw sockets (which are used for OSPF). I guess that using IP_HDRINCL would work, but it is ugly, just for IPv4 and not clearly encapsulated (reserved buffer space just on IPv4). Any BSD hints for that?
I'll try to look into multicast docs/code later this week. There are several improvements (like SSM support in 8.0+ and source filtering as in RFC 3678) but I'm not sure if this is what is needed..
On Tue, Feb 28, 2012 at 02:51:49AM +0000, Alexander V. Chernikov wrote:
On 27.02.2012 10:20, Ondrej Zajicek wrote:
On Fri, Feb 24, 2012 at 11:30:19PM +0400, Alexander V. Chernikov wrote:
Hello list!
There are several improvements that can be applied to OSPF:
1) At the moment TTL is not explicitly set in per-interface sockets, so multicast messages are sent with default OS ttl (which is 255 on *BSD). Hmm, you are right, default TTL for multicast seems to be 1 on Linux, so i missed that.
sk->flags = SKF_LADDR_RX; + sk->ttl = 1;
This will break vlinks (they use same sockets). As TTL for unicast can be different than TTL for multicast, the simplest fix would be to keep sk->ttl for unicast and add a ttl arg to sk_setup_multicast(). This will leave 255 TTL for unicast packets on NBMA ifaces, which is probably not a big issue.
Another possibility is to have a common independent socket for vlinks, but i am not sure whether vlinks need an iface-specific socket - have to be checked what are requirements for vlink sockets. This would be probably better for implementing TTL security later. Vlink sockets are not bound to interface since ospf_iface_new() are called with NULL interface address (ospf.c). ... Are you sure ospf_sk_open() is ever called by vlink code?
It is more complicated - vlink ifaces do not have their sockets, but when a vlink iface is activated, the socket is shared with 'proper' OSPF iface through which the vlink goes. See ospf_check_vlinks().
Maybe we can simply add some kind of:
+ if (s->ttl == -1) + s->ttl = 1;
You are right, this is probably enough. I will fix this.
3) Another problem with OSPF sockets is that they depend on device routes in routing tables and depend on kernel to choose proper source address (if there are more IPv4 prefixes on one iface.
In Linux, the multicast source address is specified in IP_MULTICAST_IF or IP_ADD_MEMBERSHIP, unicast source address is chosen based on dst. address.
I am not sure, but if i remember correctly, this does not work on (at least some) BSD.
Depending on device routes is not a big problem, but makes it a bit more fragile (and also a reason why we do not support _real_ unnumbered ptp ifaces). Sorry, can't get the idea (I'm mostly unaware about multicast API). What we're trying to accomplish?
The problem is not specific to multicast. There are two related but different issues: 1) send a packet through a specific iface, regardless of a state of the kernel routing table (device routes in the routing table in case of a packet to a neighbors). 2) choose a proper source address for that packet. I guess that both could be solved with SO_DONTROUTE, SO_BINDTOIFACE (Linux specific) and/or IP_PKTINFO socket options, but IP_PKTINFO does not work with raw sockets on some BSDs. For multicast sockets, a source address is specified in IP_ADD_MEMBERSHIP on Linux, but on BSD the address used in that option is AFAIK just used to specify an iface and does not affect a source address of sent packets. Unfortunately, this part of socket interface is not well documented and not really consistent between Linux and all BSDs. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Alexander V. Chernikov -
Ondrej Zajicek