802.1ag (CFM)

CFM (Connectivity Fault Management) is an OAM tool used for Carrier Ethernet to monitor and troubleshoot Ethernet services.

In CFM, domains are defined so that a customer and provider can each isolate their own “portion” of the layer two service to detect layer 2 faults. Each domain has a certain level which is numbered from 0-7. Level 0 is the smallest level or most confined, and level 7 is the largest, broadest level. Typically the customer will use level 7 because that encompasses the entire end-to-end service. The entire service provider infrastructure will be a high level but less than 7, and individual portions of the provider network, for example individual operators, will be an even lower number.

In the upcoming lab the customer uses level 7, the entire provider infrastructure uses level 4, and each individual service provider operator uses level 2:

The beauty of this setup is that a service provider can “sign off” that the service is functional within their own network even though they have no control over another providers network and therefore the full end-to-end service. This is very useful when troubleshooting and isolating faults. CFM goes a step further than just acknowledging that the xconnect is up, because CFM verifies that actual layer 2 traffic is passing. (The xconnect being up verifies the control plane only, while CFM verifies the data plane is functioning correctly).

CFM uses five types of messages:

Continuity check message
- These are ongoing probes that are sent at regular intervals to verify that the endpoint at the remote end is still alive. This is sort of like an IP SLA packet. Each MEP (we will cover this term soon) generates these frames.
Loopback message
- This is just like a layer 3 ping, but operates at layer 2. This can be used to test layer 2 reachability from one MEP to another.
Loopback reply
- This is a reply to the layer 2 “ping”
Link trace message
- This is just like a layer 3 traceroute, but operates at layer 2. This can be used to validate the path between two MEPs.
Link trace reply
- This is a reply to the layer 2 “traceroute”

Domains will pass continunity check messages that have a higher level than what is configured locally. For example, the two operator domains will pass the customer continunity checks because the level on those frames is level 7, which is higher than their own levels of 2 and 4. However, continunity checks with levels lower than what is locally configured are dropped. If the customer used a level of 3, the message would be dropped at the operator boundry.

CFM Terms

In CFM there are a few terms we need to understand which define the role that a device plays in the domain.

MEP

A device is a maintenance endpoint (MEP) if it is the boundry of a domain. CE1 and CE2 are MEPs for the level 7 Customer domain. Likewise, PE1 and PE3 are MEPs for the level 2 operator domain. However, only PE1 and PE6 are MEPs for the level 4 domain. A device is only a MEP for a domain if it defines the boundry of the domain.

MIP

A device is a maintenance intermediate point (MIP) if it is in the middle of a domain. PE3 and PE4 are MIPs for the level 4 domain because they are part of the domain but do not define the boundry. A MIP does not generate continunity check messages, but it does respond to link trace messages, so you will see the MIPs in the output of a link trace.

MEP Direction (Up vs. Down)

Each MEP has a direction, which can be a little confusing the first time you see it. The direction has to do with the relationship of the interface on which CFM is enabled to the interface used to reach the remote MEP. The CE interfaces face directly “inwards” towards the PE interfaces, towards the service, and towards the remote MEP (the other CE). The CEs are considered MEP “down” because they send CFM messages directly out the interface on which CFM is enabled, onto the wire.

The PE interfaces face “outwards” towards the CEs, not “inwards” towards the service. CFM messages on the PEs must be bridged internally and sent out the upstream interface, not sent directly out the wire of the interface on which CFM is configured. The PE interfaces are MEP “up” because they have to internally bridge the CFM frame and then send it upstream to the remote MEPs for level 2 and level 4.

Now that we understand the purpose of CFM and the terms involved with this protocol, let’s explore it in the lab. CFM does not appear to be functional on XRv or XR9000v, so we will only configure CFM on CSR1000v.

Lab

This lab uses an E-Access configuration with two service providers. Vlan 1834 is used on the NNI. I will not include the full configurations, as the IGP/LDP setup is very basic. Instead I will include the four UNI and NNI interface configurations. If you need help with understanding E-Access, visit the blog post on that topic.

#PE1
interface GigabitEthernet1
service instance 1 ethernet
  encapsulation default
  xconnect 3.3.3.3 1 encapsulation mpls

#PE3
interface GigabitEthernet1
 service instance 1834 ethernet
  encapsulation dot1q 1834
  rewrite ingress tag pop 1 symmetric
  xconnect 1.1.1.1 1 encapsulation mpls

#PE4
interface GigabitEthernet1
 service instance 1834 ethernet
  encapsulation dot1q 1834
  rewrite ingress tag pop 1 symmetric
  xconnect 6.6.6.6 1 encapsulation mpls

#PE6
interface GigabitEthernet1
 service instance 1 ethernet
  encapsulation default
  xconnect 4.4.4.4 1 encapsulation mpls

First we’ll configure CFM on the customer routers, CE1 and CE2.

#CE1
ethernet cfm ieee 
ethernet cfm global
ethernet cfm domain CUSTOMER level 7
 service number 100 port
  continuity-check
!
int Gi1
 ethernet cfm mep domain CUSTOMER mpid 111 service number 100

#CE2
ethernet cfm ieee 
ethernet cfm global
ethernet cfm domain CUSTOMER level 7
 service number 100 port
  continuity-check
!
int Gi1
 ethernet cfm mep domain CUSTOMER mpid 222 service number 100

The commands ethernet cfm ieee and ethernet cfm global enable CFM using the IEEE standard globally. Next we define the CFM domain as “CUSTOMER” for level 7. The domain name is case sensitive and needs to match on all maintenace end points (MEPs) in the domain. To identify this particular service on the domain, we can use a number, vlan-id, or any string. In this case I choose to use a service number of 100. We then enable CFM on the physical interface for the CUSTOMER domain. We need to give each device its own unique maintence point ID. CE1 is MEP ID 111 and CE2 is MEP ID 222.

Both CEs should automatically discover each other. You should see CE2 as a remote MEP on CE1, and vice versa on CE2.

CE1#show ethernet cfm maintenance-points remote 
--------------------------------------------------------------------------------
MPID  Domain Name                                 MacAddress          IfSt  PtSt
 Lvl  Domain ID                                   Ingress                       
 RDI  MA Name                                     Type Id             SrvcInst  
      EVC Name                                                        Age       
      Local MEP Info                                                            
--------------------------------------------------------------------------------
222   CUSTOMER                                    5254.0002.3fbe      Up    N/A 
 7    CUSTOMER                                    Gi1                           
 -    number 100                                  Port none           N/A       
      N/A                                                             3s
      MPID: 111 Domain: CUSTOMER MA: number 100

Total Remote MEPs: 1


CE2#show ethernet cfm maintenance-points remote 
--------------------------------------------------------------------------------
MPID  Domain Name                                 MacAddress          IfSt  PtSt
 Lvl  Domain ID                                   Ingress                       
 RDI  MA Name                                     Type Id             SrvcInst  
      EVC Name                                                        Age       
      Local MEP Info                                                            
--------------------------------------------------------------------------------
111   CUSTOMER                                    5254.0019.86c2      Up    N/A 
 7    CUSTOMER                                    Gi1                           
 -    number 100                                  Port none           N/A       
      N/A                                                             0s
      MPID: 222 Domain: CUSTOMER MA: number 100

Total Remote MEPs: 1

By default the continuity check frames are sent once every 10 seconds. We could increase this to as frequently as once per 3.3msec.

CE1(config-ecfm-srv)#continuity-check interval ?
  100ms  100 ms
  10m    10 minutes
  10ms   10 ms
  10s    10 seconds
  1m     1 minute
  1s     1 second
  3.3ms  3.3 ms

Here is a pcap of the Continuity Check Message sent from CE1:

Notice that the domain name string “CUSTOMER” is seen, as well as the service number. The service number 100 is the “Short MA name” in hex. MA stands for maintenance association. A MA is the particular service inside the maintenance domain. If this doesn’t match, two devices will not see each other as part of the same domain/MA.

We’ll now configure the Provider and Operator domains. PE1 and PE6 will be MEPs for both domains. They each terminate the Provider domain, and also terminate their own Operator domain. Although both Operator domains have the same name and level (4), they are completely independent because they terminate within each operator’s own network.

#PE1
ethernet cfm ieee 
ethernet cfm global

! Before using the evc, we must define it
ethernet evc EVC1
 exit
!
ethernet cfm domain OPERATOR level 2
 service number 1 evc EVC1
  continuity-check
ethernet cfm domain PROVIDER level 4
 service number 1 evc EVC1
  continuity-check
!
int Gi1
 ! We must first remove the existing service instance to associate it with the EVC
 no service instance 1 ethernet
 service instance 1 ethernet EVC1
  encapsulation default
  xconnect 3.3.3.3 1 encapsulation mpls
  cfm mep domain OPERATOR mpid 1001
  cfm mep domain PROVIDER mpid 1001

#PE6
ethernet cfm ieee 
ethernet cfm global
ethernet evc EVC1
 exit
!
ethernet cfm domain OPERATOR level 2
 service number 1 evc EVC1
  continuity-check
ethernet cfm domain PROVIDER level 4
 service number 1 evc EVC1
  continuity-check
!
int Gi1
 no service instance 1 ethernet
 service instance 1 ethernet EVC1
  encapsulation default
  xconnect 4.4.4.4 1 encapsulation mpls
  cfm mep domain OPERATOR mpid 1006
  cfm mep domain PROVIDER mpid 1006

Note that we use a named EVC above because we are enabling CFM on a service instance as opposed to a port, as we did on the CEs.

You should see the remote MEP come up on each router:

*Nov  2 01:52:11.628: %E_CFM-6-REMOTE_MEP_UP: Continuity Check message is received from a remote MEP with mpid 1006 evc EVC1 MA name number 1 domain OPERATOR interface status Up event code New.

Interestingly, PE1 sees PE6 as a remote MEP for both levels, including the Operator domain.

PE1#show ethernet cfm maintenance-points remote 
--------------------------------------------------------------------------------
MPID  Domain Name                                 MacAddress          IfSt  PtSt
 Lvl  Domain ID                                   Ingress                       
 RDI  MA Name                                     Type Id             SrvcInst  
      EVC Name                                                        Age       
      Local MEP Info                                                            
--------------------------------------------------------------------------------
1006  OPERATOR                                    001e.e6c5.b5bf      Up    Up  
 2    OPERATOR                                    Gi1:(3.3.3.3, 1)              
 -    number 1                                    XCON N/A            1         
      EVC1                                                            8s
      MPID: 1001 Domain: OPERATOR MA: number 1
1006  PROVIDER                                    001e.e6c5.b5bf      Up    Up  
 4    PROVIDER                                    Gi1:(3.3.3.3, 1)              
 -    number 1                                    XCON N/A            1         
      EVC1                                                            9s
      MPID: 1001 Domain: PROVIDER MA: number 1

Total Remote MEPs: 2

This is because we have not configured PE3 yet. Let’s do that now. PE3 and PE4 will be a MEP for level 2 but a MIP for level 4. A MIP responds to CFM messages at the configured level but is not an endpoint for that level, so it will not generate continuity messages nor be part of the maintenance association. (Only MEPs are in the maintenance association). We only need to define the Operator domain on PE3 and PE4:

#PE3
ethernet cfm ieee 
ethernet cfm global
ethernet evc EVC1
 exit
!
ethernet cfm domain OPERATOR level 2
 service number 1 evc EVC1
  continuity-check
!
int Gi1
 no service instance 1834 ethernet
 service instance 1834 ethernet EVC1
  encapsulation dot1q 1834
  rewrite ingress tag pop 1 symmetric
  xconnect 1.1.1.1 1 encapsulation mpls
  cfm mep domain OPERATOR mpid 1003
  cfm mip level 4

#PE4
ethernet cfm ieee 
ethernet cfm global
ethernet evc EVC1
 exit
!
ethernet cfm domain OPERATOR level 2
 service number 1 evc EVC1
  continuity-check
!
int Gi1
 no service instance 1834 ethernet
 service instance 1834 ethernet EVC1
  encapsulation dot1q 1834
  rewrite ingress tag pop 1 symmetric
  xconnect 6.6.6.6 1 encapsulation mpls
  cfm mep domain OPERATOR mpid 1004
  cfm mip level 4

PE1 now sees PE3 as the remote MEP for the Operator domain. This is because the domain is now being terminated at PE3 before it can “reach” all the way down to PE6.

PE1#show ethernet cfm maintenance-points remote 
--------------------------------------------------------------------------------
MPID  Domain Name                                 MacAddress          IfSt  PtSt
 Lvl  Domain ID                                   Ingress                       
 RDI  MA Name                                     Type Id             SrvcInst  
      EVC Name                                                        Age       
      Local MEP Info                                                            
--------------------------------------------------------------------------------
1003  OPERATOR                                    001e.49e6.74bf      Up    Up  
 2    OPERATOR                                    Gi1:(3.3.3.3, 1)              
 -    number 1                                    XCON N/A            1         
      EVC1                                                            3s
      MPID: 1001 Domain: OPERATOR MA: number 1
1006  PROVIDER                                    001e.e6c5.b5bf      Up    Up  
 4    PROVIDER                                    Gi1:(3.3.3.3, 1)              
 RDI  number 1                                    XCON N/A            1         
      EVC1                                                            5s
      MPID: 1001 Domain: PROVIDER MA: number 1

Total Remote MEPs: 2

We can now use a CFM link trace message to see the MEPs and MIPs in the Provider domain. Because PE3 and PE4 are MIPs for level 4, they respond to the link trace.

PE1#traceroute ethernet mpid 1006 domain PROVIDER service number 1 
Type escape sequence to abort. TTL 64. Linktrace Timeout is 5 seconds
Tracing the route to 001e.e6c5.b5bf on Domain PROVIDER, Level 4, service number 1, evc EVC1
Traceroute sent via Gi1:(3.3.3.3, 1), path found via MPDB

B = Intermediary Bridge
! = Target Destination
* = Per hop Timeout
--------------------------------------------------------------------------------
                             MAC      Ingress        Ingr Action  Relay Action  
  Hops   Host             Forwarded   Egress         Egr Action   Previous Hop  
--------------------------------------------------------------------------------
B 1                    001e.bd28.12bf Gi1            IngOk        RlyMPDB       
                       Forwarded                                  
B 2                    001e.49e6.74bf Gi1:(1.1.1.1,\ IngOk        RlyMPDB       
                       Forwarded      Gi1            EgrOK        001e.bd28.12bf
B 3                    001e.bda7.78bf Gi1            IngOk        RlyMPDB       
                       Forwarded                                  001e.49e6.74bf
! 4                    001e.e6c5.b5bf                             RlyHit:MEP    
                       Not Forwarded                              001e.bda7.78bf

When CE1 traces to CE2 we see nothing, it times out. PE1 does not forward the trace message, I believe because it needs to be a MIP for level 7. PE1 currently forwards continuity checks and loopback messages, because they are at a higher level (7) than its interface, but does not forward traces, apparently because it is not a MIP for level 7.

R1#traceroute ethernet mpid 222 domain CUSTOMER service number 100
Type escape sequence to abort. TTL 64. Linktrace Timeout is 5 seconds
Tracing the route to 5254.0002.3fbe on Domain CUSTOMER, Level 7, service number 100, port
Traceroute sent via Gi1

B = Intermediary Bridge
! = Target Destination
* = Per hop Timeout
--------------------------------------------------------------------------------
                             MAC      Ingress        Ingr Action  Relay Action  
  Hops   Host             Forwarded   Egress         Egr Action   Previous Hop  
--------------------------------------------------------------------------------
*
*
*
*


! CFM Loopback messages ("pings") work even though the trace times out

R1#ping ethernet mpid 222 domain CUSTOMER service number 100
Type escape sequence to abort.
Sending 5 Ethernet CFM loopback messages to 5254.0002.3fbe, timeout is 5 seconds:!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 13/14/16 ms
R1#

If we make PE1 a MIP for level 7, the trace now shows a single hop, which is PE1. Now PE3 is not forwarding the trace along.

#PE1
int Gi1
 service instance 1 ethernet EVC1
  cfm mip level 7

R1#traceroute ethernet mpid 222 domain CUSTOMER service number 100

--------------------------------------------------------------------------------
                             MAC      Ingress        Ingr Action  Relay Action  
  Hops   Host             Forwarded   Egress         Egr Action   Previous Hop  
--------------------------------------------------------------------------------
B 1                    001e.bd28.12bf Gi1            IngOk        RlyMPDB       
                       Forwarded                                  5254.0019.86c2
*
*

This brings up the “brain” MAC. The MAC 001e.bd28.12bf is a brain MAC on PE1, which we can confirm from the output of the ethernet cfm statistics:

PE1#show ethernet cfm statistics | in BRAIN
BRAIN MAC: 001e.bd28.12bf

A MIP responds to trace messages with its BRAIN MAC instead of a physical interface MAC. I believe the idea behind this is that the BRAIN MAC is tied to a bridging function.

Let’s remove MIP level 7 on PE1 and assume that the operators do not want the customer to be able to probe their devices. Let’s also disable MPLS on P5 to cause a fault in the network that we can diagnose.

#PE1
int Gi1
 service instance 1 ethernet EVC1
  no cfm mip level 7

#P5
no mpls ldp

On CE1 we see some logging messages indicating that the continunity messages are no longer being received:

*Nov  2 15:38:12.761: %E_CFM-3-REMOTE_MEP_DOWN: Remote MEP mpid 222 port MA name number 100 in domain CUSTOMER changed state to down with event code TimeOut. 
*Nov  2 15:38:12.761: %E_CFM-6-ENTER_AIS: local mep with mpid 111 level 7 Port dir D Interface Gi1 enters AIS defect condition
*Nov  2 15:38:15.322: %E_CFM-3-FAULT_ALARM: A fault has occurred in the network for the local MEP having mpid 111 port for service MA name number 100 with the event code DefRemoteCCM.

CE1 cannot trace to isolate the problem, so they call Operator 1 to report the issue. Operator 1 can quickly rule out their network as the problem, as they can “ping” and trace end-to-end within their own level 4 domain.

PE1#ping ethernet mpid 1003 domain OPERATOR service number 1
Type escape sequence to abort.
Sending 5 Ethernet CFM loopback messages to 001e.49e6.74bf, timeout is 5 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 6/7/9 ms

PE1#traceroute ethernet mpid 1003 domain OPERATOR service number 1
Type escape sequence to abort. TTL 64. Linktrace Timeout is 5 seconds
Tracing the route to 001e.49e6.74bf on Domain OPERATOR, Level 2, service number 1, evc EVC1
Traceroute sent via Gi1:(3.3.3.3, 1), path found via MPDB

B = Intermediary Bridge
! = Target Destination
* = Per hop Timeout
--------------------------------------------------------------------------------
                             MAC      Ingress        Ingr Action  Relay Action  
  Hops   Host             Forwarded   Egress         Egr Action   Previous Hop  
--------------------------------------------------------------------------------
B 1                    001e.bd28.12bf Gi1            IngOk        RlyMPDB       
                       Forwarded                                  
! 2                    001e.49e6.74bf                             RlyHit:MEP    
                       Not Forwarded                              001e.bd28.12bf

This is the power of CFM. You can very easily and quickly preform layer 2 troubleshooting and ensure that traffic forwarding is working in your own isolated domain.

Operator 2 tries tries to run an ethernet ping from PE4 to PE6 but it times out. Operator 2 can now troubleshoot their network to remediate the issue.

PE4#ping ethernet mpid 1006 domain OPERATOR service number 1
Type escape sequence to abort.
Sending 5 Ethernet CFM loopback messages to 001e.e6c5.b5bf, timeout is 5 seconds:
.....
Success rate is 0 percent (0/5)

CFM Terms

MEP

MIP

MEP Direction (Up vs. Down)

Lab

Further Reading/Watching