Sunday 10 June 2012

BGP with Unequal Cost paths

This posting covers how we can handle load sharing with unequal cost paths between two BGP Autonmous Systems.

This 6 router topology has two BGP Autonomous Systems (AS 123 and AS 456) with two exit points between them, one has a 128kbps path (R2-R4), and the other a 192kbps path (R3-R5)

 Below are our initial configurations:

R1
hostname R1
interface Loopback0
 ip address 1.1.1.1 255.255.255.255
 ip ospf 1 area 123
!
interface FastEthernet0/0
 ip address 10.1.111.1 255.255.255.0
 ip ospf 1 area 123
 no shutdown
!
router bgp 123
 no bgp default ipv4-unicast
 neighbor 2.2.2.2 remote-as 123
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 123
 neighbor 3.3.3.3 update-source Loopback0
 !
 address-family ipv4
  neighbor 2.2.2.2 activate
  neighbor 3.3.3.3 activate
  no auto-summary
  no synchronization
  network 1.1.1.1 mask 255.255.255.255
 exit-address-family
!

R2
hostname R2
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
 ip ospf 1 area 123
!
interface FastEthernet0/0
 ip address 10.1.111.2 255.255.255.0
 ip ospf 1 area 123
 no shutdown
!
interface Serial0/0
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
 no shutdown
!
interface Serial0/0.24 point-to-point
 bandwidth 128
 ip address 10.1.24.2 255.255.255.0
 snmp trap link-status
 frame-relay interface-dlci 204
!
router bgp 123
 no bgp default ipv4-unicast
 neighbor 1.1.1.1 remote-as 123
 neighbor 1.1.1.1 update-source Loopback0
 neighbor 3.3.3.3 remote-as 123
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 10.1.24.4 remote-as 456
 !
 address-family ipv4
  neighbor 1.1.1.1 activate
  neighbor 1.1.1.1 next-hop-self
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 next-hop-self
  neighbor 10.1.24.4 activate
  no auto-summary
  no synchronization
  network 2.2.2.2 mask 255.255.255.255
 exit-address-family
!

R3
hostname R3
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
 ip ospf 1 area 123
!
interface FastEthernet0/0
 ip address 10.1.111.3 255.255.255.0
 ip ospf 1 area 123
 no shutdown
!
interface Serial0/0
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
 no shutdown
!
interface Serial0/0.35 point-to-point
 bandwidth 192
 ip address 10.1.35.3 255.255.255.0
 snmp trap link-status
 frame-relay interface-dlci 305
!
router bgp 123
 no bgp default ipv4-unicast
 neighbor 1.1.1.1 remote-as 123
 neighbor 1.1.1.1 update-source Loopback0
 neighbor 2.2.2.2 remote-as 123
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 10.1.35.5 remote-as 456
 !
 address-family ipv4
  neighbor 1.1.1.1 activate
  neighbor 1.1.1.1 next-hop-self
  neighbor 2.2.2.2 activate
  neighbor 2.2.2.2 next-hop-self
  neighbor 10.1.35.5 activate
  no auto-summary
  no synchronization
  network 3.3.3.3 mask 255.255.255.255
 exit-address-family
!

R4
hostname R4
interface Loopback0
 ip address 4.4.4.4 255.255.255.255
 ip ospf 1 area 456
!
interface FastEthernet0/0
 ip address 10.1.222.4 255.255.255.0
 ip ospf 1 area 456
 no shutdown
!
interface Serial0/0
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
 no shutdown
!
interface Serial0/0.42 point-to-point
 bandwidth 128
 ip address 10.1.24.4 255.255.255.0
 snmp trap link-status
 frame-relay interface-dlci 402
!
router bgp 456
 no bgp default ipv4-unicast
 neighbor 5.5.5.5 remote-as 456
 neighbor 5.5.5.5 update-source Loopback0
 neighbor 6.6.6.6 remote-as 456
 neighbor 6.6.6.6 update-source Loopback0
 neighbor 10.1.24.2 remote-as 123
 !
 address-family ipv4
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 next-hop-self
  neighbor 6.6.6.6 activate
  neighbor 6.6.6.6 next-hop-self
  neighbor 10.1.24.2 activate
  no auto-summary
  no synchronization
  network 4.4.4.4 mask 255.255.255.255
 exit-address-family
!

R5

hostname R5
interface Loopback0
 ip address 5.5.5.5 255.255.255.255
 ip ospf 1 area 456
!
interface FastEthernet0/0
 ip address 10.1.222.5 255.255.255.0
 ip ospf 1 area 456
 no shutdown
!
interface Serial0/0
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
 no shutdown
!
interface Serial0/0.53 point-to-point
 bandwidth 192
 ip address 10.1.35.5 255.255.255.0
 snmp trap link-status
 frame-relay interface-dlci 503
!
router bgp 456
 no bgp default ipv4-unicast
 neighbor 4.4.4.4 remote-as 456
 neighbor 4.4.4.4 update-source Loopback0
 neighbor 6.6.6.6 remote-as 456
 neighbor 6.6.6.6 update-source Loopback0
 neighbor 10.1.35.3 remote-as 123
 !
 address-family ipv4
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 next-hop-self
  neighbor 6.6.6.6 activate
  neighbor 6.6.6.6 next-hop-self
  neighbor 10.1.35.3 activate
  no auto-summary
  no synchronization
  network 5.5.5.5 mask 255.255.255.255
 exit-address-family
!



R6
hostname R6
interface Loopback0
 ip address 6.6.6.6 255.255.255.255
 ip ospf 1 area 456
!
interface FastEthernet0/0
 ip address 10.1.222.6 255.255.255.0
 ip ospf 1 area 456
 no shutdown
!
router bgp 456
 no bgp default ipv4-unicast
 neighbor 4.4.4.4 remote-as 456
 neighbor 4.4.4.4 update-source Loopback0
 neighbor 5.5.5.5 remote-as 456
 neighbor 5.5.5.5 update-source Loopback0
 !
 address-family ipv4
  neighbor 4.4.4.4 activate
  neighbor 5.5.5.5 activate
  no auto-summary
  no synchronization
  network 6.6.6.6 mask 255.255.255.255
 exit-address-family
!

Lets check the bgp table and routing tables on R1 and R6

R1#sh ip bgp | b Network
   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.1/32       0.0.0.0                  0         32768 i
r>i2.2.2.2/32       2.2.2.2                  0    100      0 i
r>i3.3.3.3/32       3.3.3.3                  0    100      0 i
* i4.4.4.4/32       3.3.3.3                  0    100      0 456 i
*>i                 2.2.2.2                  0    100      0 456 i
*>i5.5.5.5/32       2.2.2.2                  0    100      0 456 i
* i                 3.3.3.3                  0    100      0 456 i
*>i6.6.6.6/32       2.2.2.2                  0    100      0 456 i
* i                 3.3.3.3                  0    100      0 456 i
R1#sh ip route bgp
     4.0.0.0/32 is subnetted, 1 subnets
B       4.4.4.4 [200/0] via 2.2.2.2, 00:03:02
     5.0.0.0/32 is subnetted, 1 subnets
B       5.5.5.5 [200/0] via 2.2.2.2, 00:03:02
     6.0.0.0/32 is subnetted, 1 subnets
B       6.6.6.6 [200/0] via 2.2.2.2, 00:01:03

R6#sh ip bgp | b Network
   Network          Next Hop            Metric LocPrf Weight Path
*>i1.1.1.1/32       4.4.4.4                  0    100      0 123 i
* i                 5.5.5.5                  0    100      0 123 i
*>i2.2.2.2/32       4.4.4.4                  0    100      0 123 i
* i                 5.5.5.5                  0    100      0 123 i
*>i3.3.3.3/32       4.4.4.4                  0    100      0 123 i
* i                 5.5.5.5                  0    100      0 123 i
r>i4.4.4.4/32       4.4.4.4                  0    100      0 i
r>i5.5.5.5/32       5.5.5.5                  0    100      0 i
*> 6.6.6.6/32       0.0.0.0                  0         32768 i
R6#sh ip route bgp
     1.0.0.0/32 is subnetted, 1 subnets
B       1.1.1.1 [200/0] via 4.4.4.4, 00:01:28
     2.0.0.0/32 is subnetted, 1 subnets
B       2.2.2.2 [200/0] via 4.4.4.4, 00:01:28
     3.0.0.0/32 is subnetted, 1 subnets
B       3.3.3.3 [200/0] via 4.4.4.4, 00:01:28

Right now we can see only one BGP route is installed in our routing table because that is the default behaviour.  Right now we are interested in R1 and R6 being able to use the other exit points from their autonomous systems, so lets enable bgp multipath (since R1 and R6 only have ibgp peers we need to enable this for ibgp)

R1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#router bgp 123
R1(config-router)#address-family ipv4
R1(config-router-af)#maximum-paths ibgp 2
R1(config-router-af)#end

R1#sh ip route bgp
     4.0.0.0/32 is subnetted, 1 subnets
B       4.4.4.4 [200/0] via 3.3.3.3, 00:00:30
                [200/0] via 2.2.2.2, 00:04:55
     5.0.0.0/32 is subnetted, 1 subnets
B       5.5.5.5 [200/0] via 3.3.3.3, 00:00:30
                [200/0] via 2.2.2.2, 00:04:55
     6.0.0.0/32 is subnetted, 1 subnets
B       6.6.6.6 [200/0] via 3.3.3.3, 00:00:30
                [200/0] via 2.2.2.2, 00:02:57

R6#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R6(config)#router bgp 456
R6(config-router)#address-family ipv4
R6(config-router-af)#maximum-paths ibgp 2
R6(config-router-af)#end
R6#
R6#sh ip route bgp
     1.0.0.0/32 is subnetted, 1 subnets
B       1.1.1.1 [200/0] via 5.5.5.5, 00:00:06
                [200/0] via 4.4.4.4, 00:03:55
     2.0.0.0/32 is subnetted, 1 subnets
B       2.2.2.2 [200/0] via 5.5.5.5, 00:00:06
                [200/0] via 4.4.4.4, 00:03:55
     3.0.0.0/32 is subnetted, 1 subnets
B       3.3.3.3 [200/0] via 5.5.5.5, 00:00:06
                [200/0] via 4.4.4.4, 00:03:55

Okay, now R1 and R6 are able to utilise both exit points however to get maximum efficiency of these links it would be worthwhile to share the load across the links relatively fairly based on the static link bandwidths.

We do this by:
  1. activating the bgp dmzlink-bw function
  2. Ensuring that we send our extended bgp communities to our IBGP peers
  3. Associate the dmzlink-bw with our EBGP neighbor

R2#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R2(config)#router bgp 123
R2(config-router)#address-family ipv4 unicast
R2(config-router-af)#bgp dmzlink-bw
R2(config-router-af)#neighbor 1.1.1.1 send-community extended
R2(config-router-af)#neighbor 3.3.3.3 send-community extended
R2(config-router-af)#neighbor 10.1.24.4 dmzlink-bw
R2(config-router-af)#end

R3#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R3(config)#router bgp 123
R3(config-router)#address-family ipv4 unicast
R3(config-router-af)#bgp dmzlink-bw
R3(config-router-af)#neighbor 1.1.1.1 send-community extended
R3(config-router-af)#neighbor 2.2.2.2 send-community extended
R3(config-router-af)#neighbor 10.1.35.5 dmzlink-bw
R3(config-router-af)#end

R1#sh ip bgp 6.6.6.6
BGP routing table entry for 6.6.6.6/32, version 16
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Multipath: iBGP
Flag: 0x8820
  Not advertised to any peer
  456
    3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath
      DMZ-Link Bw 24 kbytes
  456
    2.2.2.2 (metric 11) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath, best
      DMZ-Link Bw 16 kbytes


R1#sh ip route 6.6.6.6
Routing entry for 6.6.6.6/32
  Known via "bgp 123", distance 200, metric 0
  Tag 456, type internal
  Last update from 2.2.2.2 00:00:05 ago
  Routing Descriptor Blocks:
  * 3.3.3.3, from 3.3.3.3, 00:00:05 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 456
    2.2.2.2, from 2.2.2.2, 00:00:05 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 456

We can see that the dmz-linkbw community is being received by R1 but it's not doing anything with it...

R1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#router bgp 123
R1(config-router)#address-family ipv4 unicast
R1(config-router-af)#bgp dmzlink-bw
R1(config-router-af)#end

Now to bounce our BGP peering

R1#clear ip bgp *
R1#
*Mar  1 00:33:47.999: %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Down User reset
*Mar  1 00:33:47.999: %BGP-5-ADJCHANGE: neighbor 3.3.3.3 Down User reset
*Mar  1 00:34:18.059: %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Up
*Mar  1 00:34:18.063: %BGP-5-ADJCHANGE: neighbor 3.3.3.3 Up
R1#sh ip route 6.6.6.6
Routing entry for 6.6.6.6/32
  Known via "bgp 123", distance 200, metric 0
  Tag 456, type internal
  Last update from 2.2.2.2 00:00:05 ago
  Routing Descriptor Blocks:
  * 3.3.3.3, from 3.3.3.3, 00:00:05 ago
      Route metric is 0, traffic share count is 2
      AS Hops 1
      Route tag 456
    2.2.2.2, from 2.2.2.2, 00:00:05 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 456

That's more like it

Let's do the equivalent for our routers in AS 456

R4#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R4(config)#router bgp 456
R4(config-router)#address-family ipv4 unicast
R4(config-router-af)#bgp dmzlink-bw
R4(config-router-af)#neighbor 5.5.5.5 send-community extended
R4(config-router-af)#neighbor 6.6.6.6 send-community extended
R4(config-router-af)#neighbor 10.1.24.2 dmzlink-bw
R4(config-router-af)#end


R5#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R5(config)#router bgp 456
R5(config-router)#address-family ipv4 unicast
R5(config-router-af)#bgp dmzlink-bw
R5(config-router-af)#neighbor 4.4.4.4 send-community extended
R5(config-router-af)#neighbor 6.6.6.6 send-community extended
R5(config-router-af)#neighbor 10.1.35.3 dmzlink-bw
R5(config-router-af)#end


R6#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R6(config)#router bgp 456
R6(config-router)#address-family ipv4 unicast
R6(config-router-af)#bgp dmzlink-bw
R6(config-router-af)#end

R6#clear ip bgp *
R6#
*Mar  1 00:49:11.923: %BGP-5-ADJCHANGE: neighbor 4.4.4.4 Down User reset
*Mar  1 00:49:11.927: %BGP-5-ADJCHANGE: neighbor 5.5.5.5 Down User reset
*Mar  1 00:49:25.943: %BGP-5-ADJCHANGE: neighbor 4.4.4.4 Up
*Mar  1 00:49:25.947: %BGP-5-ADJCHANGE: neighbor 5.5.5.5 Up
R6#sh ip bgp 1.1.1.1
BGP routing table entry for 1.1.1.1/32, version 10
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Multipath: iBGP
Flag: 0x8820
  Not advertised to any peer
  123
    5.5.5.5 (metric 11) from 5.5.5.5 (5.5.5.5)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath
      DMZ-Link Bw 24 kbytes
  123
    4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath, best
      DMZ-Link Bw 16 kbytes

R6#sh ip route 1.1.1.1
Routing entry for 1.1.1.1/32
  Known via "bgp 456", distance 200, metric 0
  Tag 123, type internal
  Last update from 4.4.4.4 00:00:43 ago
  Routing Descriptor Blocks:
  * 5.5.5.5, from 5.5.5.5, 00:00:43 ago
      Route metric is 0, traffic share count is 2
      AS Hops 1
      Route tag 123
    4.4.4.4, from 4.4.4.4, 00:00:43 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 123

Let's set up some basic metering using cbqos and test if load balancing works

R2#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R2(config)#policy-map TST
R2(config-pmap)#class class-default
R2(config-pmap-c)#int s0/0.24
R2(config-subif)#service-policy input TST
R2(config-subif)#service-policy output TST
R2(config-subif)#int s0/0
R2(config-if)#load-interval 30
R2(config-if)#end

R3#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R3(config)#policy-map TST
R3(config-pmap)#class class-default
R3(config-pmap-c)#int s0/0.35
R3(config-subif)#service-policy input TST
R3(config-subif)#service-policy output TST
R3(config-subif)#int s0/0
R3(config-if)#load-interval 30
R3(config-if)#end

Now we will do an extended ping between R1 Lo0 and R6 Lo0, we'll use the record route option so we can see that the path doesn't remain the same for all of the pings.

R1#ping
Protocol [ip]:
Target IP address: 6.6.6.6
Repeat count [5]: 1000
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: 1.1.1.1
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: r
Number of hops [ 9 ]: 8
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 6.6.6.6, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet has IP options:  Total option bytes= 39, padded length=40
 Record route: <*>
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)

Reply to request 0 (20 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.35.3)
   (10.1.222.5)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.35.5)
   (10.1.111.3)
   (1.1.1.1)
   <*>
 End of list

Reply to request 1 (8 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.35.3)
   (10.1.222.5)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.35.5)
   (10.1.111.3)
   (1.1.1.1)
   <*>
 End of list

Reply to request 2 (12 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.24.2)
   (10.1.222.4)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.24.4)
   (10.1.111.2)
   (1.1.1.1)
   <*>
 End of list

Reply to request 3 (8 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.35.3)
   (10.1.222.5)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.35.5)
   (10.1.111.3)
   (1.1.1.1)
   <*>
 End of list

Reply to request 4 (16 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.35.3)
   (10.1.222.5)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.35.5)
   (10.1.111.3)
   (1.1.1.1)
   <*>
 End of list

..............
 
Reply to request 997 (8 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.35.3)
   (10.1.222.5)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.35.5)
   (10.1.111.3)
   (1.1.1.1)
   <*>
 End of list

Reply to request 998 (8 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.24.2)
   (10.1.222.4)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.24.4)
   (10.1.111.2)
   (1.1.1.1)
   <*>
 End of list

Reply to request 999 (12 ms).  Received packet has options
 Total option bytes= 36, padded length=36
 Record route:
   (10.1.111.1)
   (10.1.35.3)
   (10.1.222.5)
   (6.6.6.6)
   (10.1.222.6)
   (10.1.35.5)
   (10.1.111.3)
   (1.1.1.1)
   <*>
 End of list

Success rate is 100 percent (1000/1000), round-trip min/avg/max = 1/10/20 ms




We can see that the path alternates somewhat in a pattern that looks close to 2:1 but lets check it by looking at the traffic counters

R2#sh policy-map int s0/0.24

 Serial0/0.24

  Service-policy input: TST

    Class-map: class-default (match-any)
      335 packets, 34739 bytes
      30 second offered rate 8000 bps, drop rate 0 bps
      Match: any

  Service-policy output: TST

    Class-map: class-default (match-any)
      335 packets, 38351 bytes
      30 second offered rate 9000 bps, drop rate 0 bps
      Match: any

R3#sh policy-map int s0/0.35

 Serial0/0.35

  Service-policy input: TST

    Class-map: class-default (match-any)
      668 packets, 69431 bytes
      30 second offered rate 12000 bps, drop rate 0 bps
      Match: any

  Service-policy output: TST

    Class-map: class-default (match-any)
      670 packets, 76471 bytes
      30 second offered rate 13000 bps, drop rate 0 bps
      Match: any

The packet load appears to pretty much align with the traffic share count.

1 comment:

  1. Very interesting article and the website is focused and very well maintained. Thanks for the information and keep posting.

    ReplyDelete