bug report:OSPF adjacency creation
Dear bird developers and users, I'm having problem with the OSPF adjacency creation. This problem imposes an additional 30min (LSRefreshTime) delay to the adjacency creation process under specific condition. I guess bird's developers are somehow aware of this problem because it's indicated in "line:675, file:lsupd.c, bird:1.3.4" where the developer has commented "/* FIXME pg145 (6) */". Therefore, aside from the rest of this email in which I'm explaining how this problem occurs, I was wondering what's the resean that the developers did not fix it? How Does The Problem Occur? Consider a simple point-to-point networks depicted in Figure 1. In this network two nodes with router id 0.0.0.1 and 0.0.0.2 are connected via interfaces "Ia" and "Ib". If we suppose that the neighbor adjacency is already formed, then each node has two LSA instances in its link-state database. Obviously link-state database content for these two nodes is similar (probably different just in age field). Therefore, LSA headers can be summarized by "LS-type", "LS-ID", "Advertising Router", "LS sequence number" and "LS checksum" fields as: LSA1-Header=(type:1, id:0.0.0.1, rt:0.0.0.1, sqNo:x, checksum:ch1) LSA2-Header=(type:1, id:0.0.0.2, rt:0.0.0.2, sqNo:y, checksum:ch2). +----+Ia Ib+----+ LSA1 |RT1|---------|RT2| LSA1 LSA2 +----+ +----+ LSA2 LSA1-Header=(type:1, id:0.0.0.1, rt:0.0.0.1, sqNo:x, checksum:ch1) LSA2-Header=(type:1, id:0.0.0.2, rt:0.0.0.2, sqNo:y, checksum:ch2) Fig.1 The data included in LSA1 indicate that router 0.0.0.1 is connected to router 0.0.0.2 via interface "Ib" and similarly the data included in LSA2 indicate router 0.0.0.2 is connected to router 0.0.0.1 via the interface "Ia". Now if the link between these two nodes is removed bidirectionally as demonstrated in Figure 2 and the RouterDeadInterval timer is also flushed, each router has to delete the former adjacency. This action obviously translates to generating new LSAs, individually named LSA1* and LSA2*. According to the method implemented by developer the header of these two new LSAs (excepting the "age" and "checksum" field) would be the same as their former corresponding LSAs. It means LSA1*-Header=(type:1, rt:0.0.0.1, id:0.0.0.1, sqNo:x, checksum:ch1*) and LSA2*-Header=(type:1, id:0.0.0.2, rt:0.0.0.2, sqNo:x, checksum:ch2*). However the data content of corresponding LSAs is different since the previous adjacency is removed at each side. +----+Ia Ib+----+ LSA1* |RT1| |RT2| LSA1 LSA2 +----+ +----+ LSA2* LSA1*-Header=(type:1, id:0.0.0.1, rt:0.0.0.1, sqNo:x, checksum:ch1*) LSA2-Header=(type:1, id:0.0.0.2, rt:0.0.0.2, sqNo:y, checksum:ch2) LSA1-Header=(type:1, id:0.0.0.1, rt:0.0.0.1, sqNo:x, checksum:ch1) LSA2*-Header=(type:1, id:0.0.0.2, rt:0.0.0.2, sqNo:y, checksum:ch2*) Fig.2 Attention: As I'm going to explain in detail, the key condition in which the problem occurs is that during generation of LSA1* and LSA2*, we have "ch1>ch1* and ch2*>ch2". The reason behind this condition can be analysed separately. Yet, the point is that it DOES occur randomly in frequent examinations I hava made. Now suppose the link between two nodes is restored as in Figure 3. Thus, we start tracking adjacency creation process. At the beginning each node transfers a summery of its link-state database through database description packets. As a result router 0.0.0.1 and router 0.0.0.2 respectively send {LSA1*,LSA2} and {LSA1,LSA2*} to the other side. Upon receiving database description packets at each side and based on the condition checked in "line:232, function:ospf_dbdes_reqladd, file:dbdes.c, bird:1.3.4" each received LSA in database description packed is tested whether it is new or not. If it's new, the LSA is added to link-state request list otherwise it's discarded. The condition of being new is evaluated correct either when there is no instance of such LSA in link-state database or when the received LSA is more recent than that one occupied in link-state database. Here the term more recent is equivalent to RFC2328 section 13.1. (For two LSAs with similar type, id and advertising router, the comparison is performed using first:sequence number, second:checksum and third:age). Receiving database description packets, router 0.0.0.2 starts checking whether each included LSA should be added to link-state request list or not. LSA1* and LSA2 respectively have the same sequence number compared with LSA1 and LSA2*. Therefore, according to the supposed condition (ch1>ch1* and ch2*>ch2) LSA1 and LSA2* are respectively considered more recent than LSA1* and LSA2. As a result router 0.0.0.1 adds LSA1 and LSA2* to its link-state request list. With the same reasoning router 0.0.0.2 discards LSA1* and LSA2 and right after finishing database exchange process it establishes the new adjacency by generating a new LSA, named LSA2** which has the same sequence number but different age, content and checksum in comparison with LSA2*. +----+Ia Ib+----+ LSA1* |RT1|---------|RT2| LSA1 LSA2 +----+ +----+ LSA2** LSA1*-Header=(type:1, id:0.0.0.1, rt:0.0.0.1, sqNo:x, checksum:ch1*) LSA2-Header=(type:1, id:0.0.0.2, rt:0.0.0.2, sqNo:y, checksum:ch2) LSA1-Header=(type:1, id:0.0.0.1, rt:0.0.0.1, sqNo:x, checksum:ch1) LSA2**-Header=(type:1, id:0.0.0.2, rt:0.0.0.2, sqNo:y, checksum:ch2**) Fig.3 Back to the scenario, router 0.0.0.1 asks for LSA1 and LSA2* as they exist in its link-state request list. In response to this request router 0.0.0.2 prepares a link-state update packet consist of LSA2** and LSA1 (Note that immediate adjacency completion for router 0.0.0.2 replaces LSA2* with LSA2**). As router 0.0.0.1 receives LSA1, according to RFC2328 section 13.4, it shall recognize this LSA as self originated LSA and thus, it replies to this LSA with increasing the sequence number and flooding it back to the router 0.0.0.2. However, the case for receiving LSA2** is different. It may not be obvious in the fist look but it's true that LSA2** is exactly the same as LAS2 by having the same "LS-type", "LS-ID", "Advertising Router", "LS sequence number" and "LS checksum" (they both indicate that router 0.0.0.2 has an adjacency with router 0.0.0.1 via the similar interface). Thus if the difference of the age fields is less than MaxAgeDiff they must be interpreted as similar LSAs. Upon receiving LSA2** and according to the code implemented in "line:678, function:ospf_lsupd_receive, file:lsupd.c, bird:1.3.4", this LSA is simply replied by a link-state acknowledge packet without checking whether it's present in link-state request list or not. Consequently, router 0.0.0.1 repeatedly asks for LSA2* update and router 0.0.0.2 repeatedly answers with LSA2** which is similar to LSA2 occupied in router 0.0.0.1 database. This loop continues until LSRefreshTime timer for LSA2** is flushed and router 0.0.0.2, originates LSA2*** with the new sequence number as LSA2***-Header=(type:1, rt:0.0.0.2, id:0.0.0.2, sqNo:y+1, checksum:ch2***). Here, the router 0.0.0.1 accepts LSA2*** as the more recent LSA and in a happy ending this router establishes the new adjacency. Now here is my question: RFC2328 page145 rule number 6 can be considered as a solution for this condition in which after receiving the less recent LSA as the one requested through link-state request list the node emits BadLSReq. As a result database exchange process restarts and in the new round we have no more the previous problem. So why this rule is not applied? I'd be glad to have your comments and suggestions on this issue Regards Amin Shoaie
On Mon, Oct 24, 2011 at 01:36:15PM +0330, Mohammad Amin Shoaie wrote:
Dear bird developers and users,
I'm having problem with the OSPF adjacency creation. This problem imposes an additional 30min (LSRefreshTime) delay to the adjacency creation process under specific condition. I guess bird's developers are somehow aware of this problem because it's indicated in "line:675, file:lsupd.c, bird:1.3.4" where the developer has commented "/* FIXME pg145 (6) */". Therefore, aside from the rest of this email in which I'm explaining how this problem occurs, I was wondering what's the resean that the developers did not fix it?
Thank for the bugreport. I was aware of some FIXMEs in code, but i was not aware of any potential problems. BTW, from your description i think that the main source of the problem is that LSA1* has the same seqnum as older LSA1 (which i am not aware of). I will check that and fix that. -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: santiago@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."
participants (2)
-
Mohammad Amin Shoaie -
Ondrej Zajicek