Wednesday, 1 June 2011

EIGRP Unequal Cost Paths

This post is going to be about how to perform unequal cost load balancing using EIGRP.  There's going to be a little bit of mathematics here but nothing beyond some basic algebra.


The topology for this post is shown below:  We have 4 routers, with multiple paths between them, each has a loopback interface and all interfaces are in EIGRP AS 100.



R1--FR-256kbps--R2--FR-256kbps--R4
 \                              /
  \-FR-512kbps-R3-FastEthernet-/



The initial configs are shown below:
R1
hostname R1
interface Loopback0
 ip address 1.1.1.1 255.0.0.0
!
interface Serial0/0
 description to Frame Switch
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
!
interface Serial0/0.12 point-to-point
 description R1-R2
 bandwidth 256
 ip address 10.1.12.1 255.255.255.0
 frame-relay interface-dlci 102
!
interface Serial0/0.13 point-to-point
 description R1-R3
 bandwidth 512
 ip address 10.1.13.1 255.255.255.0
 frame-relay interface-dlci 103
!
router eigrp 100
 network 1.1.1.1 0.0.0.0
 network 10.1.12.1 0.0.0.0
 network 10.1.13.1 0.0.0.0
 no auto-summary
!

R2
hostname R2
interface Loopback0
 ip address 2.2.2.2 255.0.0.0
!
interface Serial0/0
 description to Frame Switch
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
!
interface Serial0/0.21 point-to-point
 bandwidth 256
 description R2-R1
 ip address 10.1.12.2 255.255.255.0
 frame-relay interface-dlci 201
!
interface Serial0/0.24 point-to-point
 bandwidth 256
 description R2-R4
 ip address 10.1.24.2 255.255.255.0
 frame-relay interface-dlci 204
!
router eigrp 100
 network 2.2.2.2 0.0.0.0
 network 10.1.12.2 0.0.0.0
 network 10.1.24.2 0.0.0.0
 no auto-summary
!

R3
hostname R3
interface Loopback0
 ip address 3.3.3.3 255.0.0.0
!
interface FastEthernet0/0
 description to R3-R4
 ip address 10.1.34.3 255.255.255.0
 speed 100
 full-duplex
!
interface Serial0/0
 description to Frame Switch
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
!
interface Serial0/0.31 point-to-point
 description R3-R1
 ip address 10.1.13.3 255.255.255.0
 frame-relay interface-dlci 301
!
router eigrp 100
 network 3.3.3.3 0.0.0.0
 network 10.1.13.3 0.0.0.0
 network 10.1.34.3 0.0.0.0
 no auto-summary
!

R4
hostname R4
interface Loopback0
 ip address 4.4.4.4 255.0.0.
!
interface FastEthernet0/0
 description to R4-R3
 ip address 10.1.34.4 255.255.255.0
 speed 100
 full-duplex
!
interface Serial0/0
 description to Frame Switch
 no ip address
 encapsulation frame-relay
 no frame-relay inverse-arp
!
interface Serial0/0.42 point-to-point
 bandwidth 256
 description R4-R2
 ip address 10.1.24.4 255.255.255.0
 frame-relay interface-dlci 402
!
router eigrp 100
 network 4.4.4.4 0.0.0.0
 network 10.1.24.4 0.0.0.0
 network 10.1.34.4 0.0.0.0
 no auto-summary
!


As you can see we are using the default metric weights here: since we haven't changed anything K1 (Bandwidth) and K3 (Delay) are set to 1, with K2 , K4 and K5 left at 0

Therefor EIGRP metrics are calculated by 256 * ( 10^7 / WorstPathBandwidth + (TotalPathDelay/10) ) Where WorstPathBandwidth is in kbps and TotalPathDelay is in usecs.

From R1 if we were to work out the EIGRP Metric to 4.0.0.0 (Loopback 0 on R4) we can see two paths based on the topology - one via R2 and one via R3

Path 1 Metric calculation For R1 to 4.0.0.0 (R4 Lo0) via R2

Using the default metric weights, we can follow the same process EIGRP uses to calculate metrics by following the paths hop by hope and obtain the interface bandwidth (BW) and delay (DLY) values from show interface | inc DLY

Before we directly see what metric R1 has to 4.0.0.0, we need to calculate the metric R2 has to reach 4.0.0.0 as this is used by EIGRP to determine the feasibility condition on R1.  I'll get into the importance of this a little further down but right now it should be enough to know that our router of interest (R1) needs to know the advertised metric from our EIGRP upstream neighbor as well as computing the metric locally.

R2's Metric to 4.0.0.0 (R4 Lo0)

From R2 S0/0.24 to R4 S0/0.24 (Bandwidth 256 kbps and 20000 usec Delay)
From R4 To R4 Lo (Bandwidth 8000000 kbps and 5000 usec Delay)

B = round (10 ^ 7 / WorstPathBandwidth) = 39062
D = TotalPathDelay / 10 = 2500

Metric = 256 * (B + D)

  = 10639872

This metric (which R1 will use as part of the feasbility calculations) as well as the worst bandwidth and total delay will be advertised to R1 in order for it to compute the end metric to 4.0.0.0


R1's Metric to 4.0.0.0 (R4 Lo0) via R2



R1 will look at its connection to R2 (R1 S0/0.12 to R2 S0/0.21) indentifying the link bandwidth is 256 kbps and 20000 usec Delay and factor this into it's metric calculation, including the bandwidth and delay values received from R2:

WorstPathBandwidth = 256 kbps and TotalPathDelay = 45000 usec

R1's computed Metric to 4.0.0.0 via R2 is therefore


Metric = 256 * (B + D)
Where:
    B = round (10 ^ 7 / WorstPathBandwidth) = 39062
    D = TotalPathDelay / 10 = 4500

Which results in 11151872


Path 2 Metric calculation For R1 to 4.0.0.0 (R4 Lo0) via R3

This is a higher bandwidth path so we expect to see a lower metric here.

R3's Metric to 4.0.0.0 (R4 Lo0)

From R3 Fa0/0 to R4 Fa0/0 (Bandwidth 100000 kbps and 100 usec Delay)
To R4 Lo (Bandwidth 8000000 kbps and 5000 usec Delay)

Metric = 256 * (B + D)
Where:   B = round (10 ^ 7 / WorstPathBandwidth) = 100
   D = TotalPathDelay / 10 = 510

Which results in 156160

R1's Metric to 4.0.0.0 (R4 Lo0) via R3



R1 will look at its connection to R3 (R1 S0/0.13 to R3 S0/0.31) indentifying the link Bandwidth 512 kbps and 20000 usec Delay and factor this into it's metric calculation, including the bandwidth and delay values advertised from R3:

WorstPathBandwidth = 512 kbps and TotalPathDelay = 25100 usec

Metric = 256 * (B + D)
Where:   B = round (10 ^ 7 / WorstPathBandwidth) = 19531
   D = TotalPathDelay / 10 = 2510

Which results in 5642496

Okay, so we can work out metric calculations by hand, lets see if R1 agrees with our findings:

R1#sh ip eigrp topology 4.0.0.0
IP-EIGRP (AS 100): Topology entry for 4.0.0.0/8
  State is Passive, Query origin flag is 1, 1 Successor(s), FD is 5642496
  Routing Descriptor Blocks:
  10.1.13.3 (Serial0/0.13), from 10.1.13.3, Send flag is 0x0
      Composite metric is (5642496/156160), Route is Internal
      Vector metric:
        Minimum bandwidth is 512 Kbit
        Total delay is 25100 microseconds
        Reliability is 255/255
        Load is 1/255
        Minimum MTU is 1500
        Hop count is 2
  10.1.12.2 (Serial0/0.12), from 10.1.12.2, Send flag is 0x0
      Composite metric is (11151872/10639872), Route is Internal
      Vector metric:
        Minimum bandwidth is 256 Kbit
        Total delay is 45000 microseconds
        Reliability is 255/255
        Load is 1/255
        Minimum MTU is 1500
        Hop count is 2

We can see that the minimum bandwidth aligns with our WorstPathBandwidth and that the total delay is also in agreement with our manual calculations.

When we look at the composite metrics we can see two values (A/B) where A represents the computed metric to 4.0.0.0 (R4 Lo0) as far as R1 is concerned for a particular next-hop and B represents the metric that the next-hop has to reach 4.0.0.0

Task

Make R1 load balance traffic destined to R4 Lo0 with a traffic-share of 2:1 (matching with our PVC bandwidths)

Obviously this means that we use the eigrp "variance" command, however on its own this will not be enough to achieve the goal:

If we look at the metrics associated with both paths by examing the EIGRP topology database:

R1#sh ip eigrp topology 4.0.0.0 | inc from|Composite|bandwidth|delay
  10.1.13.3 (Serial0/0.13), from 10.1.13.3, Send flag is 0x0
      Composite metric is (5642496/156160), Route is Internal
        Minimum bandwidth is 512 Kbit
        Total delay is 25100 microseconds
  10.1.12.2 (Serial0/0.12), from 10.1.12.2, Send flag is 0x0
      Composite metric is (11151872/10639872), Route is Internal
        Minimum bandwidth is 256 Kbit
        Total delay is 45000 microseconds

Although we need to consider that while 11151872 and 5642496 doesn't exactly equal 2, more importantly in order for EIGRP install another route into the routing table for the same destination, the feasbility condition has to be met.  In this case it means that for the path via 10.1.12.2 to be considered, the advertised metric from R2 (10639872) must be less than R1's computed metric via R3 (5642496) which at this point in time is not the case.

With the default K values, we only have bandwidth and delay to play with and usually you do not want to change bandwidth values because that might break a number of other things such as QoS.

If we compare the two paths, we can see the link via R3 has a delay of 25100 usec and via R2 a delay of 45000 usec.  If we equalise the delay, the bandwidth values are already two to one, maybe that should make things work?

On R1 on the link facing R3, lets increase the delay by 45000 - 25100 = 19900 usecs

Firstly lets confirm the current delay

R1#sh int s0/0.13 | inc DLY
  MTU 1500 bytes, BW 1544 Kbit/sec, DLY 20000 usec,

So obviously the new delay should be 39900 usec but remember we set delay in 10s of usec

R1(config)#int s0/0.13
R1(config-subif)# delay 3990

Did this work? Lets set the variance and see what appears in the routing table

R1(config)#router eigrp 100
R1(config-router)#variance 2
R1(config-router)#do sh ip route 4.0.0.0
Routing entry for 4.0.0.0/8
  Known via "eigrp 100", distance 90, metric 6151936, type internal
  Redistributing via eigrp 100
  Last update from 10.1.13.3 on Serial0/0.13, 00:00:05 ago
  Routing Descriptor Blocks:
  * 10.1.13.3, from 10.1.13.3, 00:00:05 ago, via Serial0/0.13
      Route metric is 6151936, traffic share count is 1
      Total delay is 45000 microseconds, minimum bandwidth is 512 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 2

No, it doesn't appear to have worked - what do we see in the topology database?

R1(config-router)#do sh ip eigrp topology 4.0.0.0 | inc from|Composite|bandwidth|delay
  10.1.13.3 (Serial0/0.13), from 10.1.13.3, Send flag is 0x0
      Composite metric is (6151936/156160), Route is Internal
        Minimum bandwidth is 512 Kbit
        Total delay is 45000 microseconds
  10.1.12.2 (Serial0/0.12), from 10.1.12.2, Send flag is 0x0
      Composite metric is (11151872/10639872), Route is Internal
        Minimum bandwidth is 256 Kbit
        Total delay is 45000 microseconds


Well while the bandwidth and delay values are in alignment with the R1 computed metrics to 4.0.0.0 with the ratio of 2:1 the feasability condition is not being met, so only the route via R3 can be used

One way to satisfy the FC is to make the computed metric for R3 to exceed 10639872 (the advertised metric from R2)

Metric = 256 * (B + D)
Where:    B = round (10 ^ 7 / WorstPathBandwidth)
   D = TotalPathDelay / 10

10639872 = 256 * (10^7/512 + D/10)

10639872 / 256 - 10^7/512 = D/10

22031 = D/10

D = 220310 but 5100 usec of delay already exist, so D has to be greater than 215210 in order for the FC to pass...

R1(config-subif)#int s0/0.13
R1(config-subif)#delay 21522

So will EIGRP now install this route?

R1(config-subif)#do sh ip route 4.0.0.0
Routing entry for 4.0.0.0/8
  Known via "eigrp 100", distance 90, metric 10640128, type internal
  Redistributing via eigrp 100
  Last update from 10.1.12.2 on Serial0/0.12, 00:00:17 ago
  Routing Descriptor Blocks:
  * 10.1.13.3, from 10.1.13.3, 00:00:17 ago, via Serial0/0.13
      Route metric is 10640128, traffic share count is 240
      Total delay is 220320 microseconds, minimum bandwidth is 512 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 2
    10.1.12.2, from 10.1.12.2, 00:00:17 ago, via Serial0/0.12
      Route metric is 11151872, traffic share count is 229
      Total delay is 45000 microseconds, minimum bandwidth is 256 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 2

Yes, we have two routes to 4.0.0.0 but the traffic share count is wrong (it should be 2 to 1 not 240 to 229)

For this to work we need to play with delay again but this time on the R2 facing link so that the computed metric of the path using R2 is double that using R3

Metric = 256 * (B + D)
Where:    B = round (10 ^ 7 / WorstPathBandwidth)
   D = TotalPathDelay / 10



2 * 10640128 = 256 * (10^7/256 + D/10)

83126 = 39062 + D/10

D = 440640 usec (the current delay is 45000 usec, requiring an extra 395640 usec of added delay)

R1(config-subif)#do sh int s0/0.12 | inc DLY
  MTU 1500 bytes, BW 256 Kbit/sec, DLY 20000 usec,
R1(config-subif)#int s0/0.12
R1(config-subif)#delay 41564

So does this sort it out?

R1(config-subif)#do sh ip route 4.0.0.0
Routing entry for 4.0.0.0/8
  Known via "eigrp 100", distance 90, metric 10640128, type internal
  Redistributing via eigrp 100
  Last update from 10.1.12.2 on Serial0/0.12, 00:00:03 ago
  Routing Descriptor Blocks:
  * 10.1.13.3, from 10.1.13.3, 00:00:03 ago, via Serial0/0.13
      Route metric is 10640128, traffic share count is 2
      Total delay is 220320 microseconds, minimum bandwidth is 512 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 2
    10.1.12.2, from 10.1.12.2, 00:00:03 ago, via Serial0/0.12
      Route metric is 21280256, traffic share count is 1
      Total delay is 440640 microseconds, minimum bandwidth is 256 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 2


Okay, the from a routing point of view this looks good, lets make sure that CEF is active and enable per-packet load balancing

R1(config-subif)#ip cef
R1(config)#int s0/0.12
R1(config-subif)#ip load-sharing per-packet
R1(config-subif)#int s0/0.13
R1(config-subif)#ip load-sharing per-packet

To see that things are going to plan, we shall set up a service policy to match icmp traffic and attach it to both exit interfaces:

R1(config-subif)#int s0/0
R1(config-if)#load-interval 30
R1(config-if)#access-list 100 permit icmp any any
R1(config)#class-map ICMP
R1(config-cmap)#match access-group 100
R1(config-cmap)#policy-map OUT
R1(config-pmap)#class ICMP
R1(config-pmap-c)#int s0/0.12
R1(config-subif)#service-policy output OUT
R1(config-subif)#int s0/0.13
R1(config-subif)#service-policy output OUT
R1(config-subif)#do show policy-map int | s Serial|ICMP
 Serial0/0.12
    Class-map: ICMP (match-all)
      0 packets, 0 bytes
      30 second offered rate 0 bps
      Match: access-group 100

 Serial0/0.13
    Class-map: ICMP (match-all)
      0 packets, 0 bytes
      30 second offered rate 0 bps
      Match: access-group 100

R1(config-subif)#do ping 4.4.4.4 source 1.1.1.1 repeat 100

Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 4.4.4.4, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 12/50/100 ms

Since the load is meant to be 2:1, we should see roughly 34 packets on S0/0.12 and 66 on S0/0.13


R1(config-subif)#do show policy-map int | s Serial|ICMP
 Serial0/0.12
    Class-map: ICMP (match-all)
      35 packets, 3640 bytes
      30 second offered rate 2000 bps
      Match: access-group 100

 Serial0/0.13
    Class-map: ICMP (match-all)
      65 packets, 6760 bytes
      30 second offered rate 2000 bps
      Match: access-group 100


Seems close enough, how is CEF handling the load sharing?

R1(config-subif)#do sh ip cef 4.0.0.0 internal
4.0.0.0/8, version 691, epoch 0, per-packet sharing
0 packets, 0 bytes
  via 10.1.13.3, Serial0/0.13, 0 dependencies
    traffic share 2
    next hop 10.1.13.3, Serial0/0.13
    valid adjacency
  via 10.1.12.2, Serial0/0.12, 0 dependencies
    traffic share 1, current path
    next hop 10.1.12.2, Serial0/0.12
    valid adjacency

  0 packets, 0 bytes switched through the prefix
  tmstats: external 0 packets, 0 bytes
           internal 0 packets, 0 bytes
  Load distribution: 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 (refcount 1)

  Hash  OK  Interface                 Address         Packets
  1     Y   Serial0/0.13              point2point           0
  2     Y   Serial0/0.12              point2point           0
  3     Y   Serial0/0.13              point2point           0
  4     Y   Serial0/0.12              point2point           0
  5     Y   Serial0/0.13              point2point           0
  6     Y   Serial0/0.12              point2point           0
  7     Y   Serial0/0.13              point2point           0
  8     Y   Serial0/0.12              point2point           0
  9     Y   Serial0/0.13              point2point           0
  10    Y   Serial0/0.12              point2point           0
  11    Y   Serial0/0.13              point2point           0
  12    Y   Serial0/0.13              point2point           0
  13    Y   Serial0/0.13              point2point           0
  14    Y   Serial0/0.13              point2point           0
  15    Y   Serial0/0.13              point2point           0
  refcount 6


Interestingly it appears that CEF alternates between both links for the first few packets and then concentrates on just using S0/0.13 towards the end of the sequence