Sunday 7 August 2011

External BGP Next Hop

Here's an interesting BGP discovery (well at least it's interesting to me...)

A common belief that is not quite true is that EBGP peers set the advertised next hop to be themselves. If we have a look at RFC 1771 section 5.1.3 (NEXT_HOP)

A BGP speaker can advertise any internal border router as the next hop provided that the interface associated with the IP address of this border router (as specified in the NEXT_HOP path attribute) shares a common subnet with both the local and remote BGP speakers. A BGP speaker can advertise any external border router as the next hop, provided that the IP address of this border router was learned from one of the BGP speaker's peers, and the interface associated with the IP address of this border router (as specified in the NEXT_HOP path attribute) shares a common subnet with the local and remote BGP speakers.

Take this example to verify it:

We have 3 routers each in a separate BGP AS.  R1 is peered with R2, and R3 is peered with R2:

R1 and R3 are connected to a switch belonging to R2 and Fa0/0 of R1/R2/R3 are all in the same VLAN.

Below are the relevant configs:

R1
hostname R1
interface Loopback0
 ip address 1.1.1.1 255.255.255.255
!
interface FastEthernet0/0
 ip address 10.1.123.1 255.255.255.0
!
router bgp 1
 no synchronization
 network 1.1.1.1 mask 255.255.255.255
 neighbor 10.1.123.2 remote-as 2
 no auto-summary

R2
hostname R2
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
!
interface FastEthernet0/0
 ip address 10.1.123.2 255.255.255.0
!
router bgp 2
 no synchronization
 network 2.2.2.2 mask 255.255.255.255
 neighbor 10.1.123.1 remote-as 1
 neighbor 10.1.123.3 remote-as 3
 no auto-summary

R3
hostname R3
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
interface FastEthernet0/0
 ip address 10.1.123.3 255.255.255.0
!
router bgp 3
 no synchronization
 bgp log-neighbor-changes
 network 3.3.3.3 mask 255.255.255.255
 neighbor 10.1.123.2 remote-as 2
 no auto-summary

Lets make sure everyone has BGP connectivity and has learnt prefixes:

R1#sh ip bgp summ | b Neighbor
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.123.2      4     2      26      20       28    0    0 00:13:46        2

R2#sh ip bgp summ | b Neighbor
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.123.1      4     1      20      26       28    0    0 00:13:48        1
10.1.123.3      4     3      20      27       28    0    0 00:12:44        1

R3#sh ip bgp summ | b Neighbor
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.123.2      4     2      27      20       28    0    0 00:12:31        2


Looks reasonable, lets check what we routes have learnt from BGP


R1#sh ip route bgp
     2.0.0.0/32 is subnetted, 1 subnets
B       2.2.2.2 [20/0] via 10.1.123.2, 00:11:57
     3.0.0.0/32 is subnetted, 1 subnets
B       3.3.3.3 [20/0] via 10.1.123.3, 00:10:56


R2#sh ip route bgp
     1.0.0.0/32 is subnetted, 1 subnets
B       1.1.1.1 [20/0] via 10.1.123.1, 00:11:45
     3.0.0.0/32 is subnetted, 1 subnets
B       3.3.3.3 [20/0] via 10.1.123.3, 00:11:45


R3#sh ip route bgp
     1.0.0.0/32 is subnetted, 1 subnets
B       1.1.1.1 [20/0] via 10.1.123.1, 00:12:05
     2.0.0.0/32 is subnetted, 1 subnets
B       2.2.2.2 [20/0] via 10.1.123.2, 00:13:06


The interesting thing here is that R1 has 3.3.3.3/32 reachable via R3 Fa0/0 and R3 has see 1.1.1.1/32 reachable via R1 Fa0/0 even though neither of these guys have a direct BGP peering with each other.

It appears that R2 is smart enough to realise since R1 and R3 are on the same subnet rather than having R2 get in the way, it will not change the advertised next-hop IP address to be itself even though R1 and R2 are in different autonomous systems

R1#sh ip bgp
BGP table version is 28, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.1/32       0.0.0.0                  0         32768 i
*> 2.2.2.2/32       10.1.123.2               0             0 2 i
*> 3.3.3.3/32       10.1.123.3                             0 2 3 i

R2#sh ip bgp
BGP table version is 28, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.1/32       10.1.123.1               0             0 1 i
*> 2.2.2.2/32       0.0.0.0                  0         32768 i
*> 3.3.3.3/32       10.1.123.3               0             0 3 i

R3#sh ip bgp
BGP table version is 28, local router ID is 3.3.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.1/32       10.1.123.1                             0 2 1 i
*> 2.2.2.2/32       10.1.123.2               0             0 2 i
*> 3.3.3.3/32       0.0.0.0                  0         32768 i

We can see that R2's AS is still included within the AS-PATH for the routing information even if it is not in the traffic forwarding path

No comments:

Post a Comment