802.1ag (CFM)
Last updated
Last updated
CFM (Connectivity Fault Management) is an OAM tool used for Carrier Ethernet to monitor and troubleshoot Ethernet services.
In CFM, domains are defined so that a customer and provider can each isolate their own “portion” of the layer two service to detect layer 2 faults. Each domain has a certain level which is numbered from 0-7. Level 0 is the smallest level or most confined, and level 7 is the largest, broadest level. Typically the customer will use level 7 because that encompasses the entire end-to-end service. The entire service provider infrastructure will be a high level but less than 7, and individual portions of the provider network, for example individual operators, will be an even lower number.
In the upcoming lab the customer uses level 7, the entire provider infrastructure uses level 4, and each individual service provider operator uses level 2:
The beauty of this setup is that a service provider can “sign off” that the service is functional within their own network even though they have no control over another providers network and therefore the full end-to-end service. This is very useful when troubleshooting and isolating faults. CFM goes a step further than just acknowledging that the xconnect is up, because CFM verifies that actual layer 2 traffic is passing. (The xconnect being up verifies the control plane only, while CFM verifies the data plane is functioning correctly).
CFM uses five types of messages:
Continuity check message
These are ongoing probes that are sent at regular intervals to verify that the endpoint at the remote end is still alive. This is sort of like an IP SLA packet. Each MEP (we will cover this term soon) generates these frames.
Loopback message
This is just like a layer 3 ping, but operates at layer 2. This can be used to test layer 2 reachability from one MEP to another.
Loopback reply
This is a reply to the layer 2 “ping”
Link trace message
This is just like a layer 3 traceroute, but operates at layer 2. This can be used to validate the path between two MEPs.
Link trace reply
This is a reply to the layer 2 “traceroute”
Domains will pass continunity check messages that have a higher level than what is configured locally. For example, the two operator domains will pass the customer continunity checks because the level on those frames is level 7, which is higher than their own levels of 2 and 4. However, continunity checks with levels lower than what is locally configured are dropped. If the customer used a level of 3, the message would be dropped at the operator boundry.
In CFM there are a few terms we need to understand which define the role that a device plays in the domain.
A device is a maintenance endpoint (MEP) if it is the boundry of a domain. CE1 and CE2 are MEPs for the level 7 Customer domain. Likewise, PE1 and PE3 are MEPs for the level 2 operator domain. However, only PE1 and PE6 are MEPs for the level 4 domain. A device is only a MEP for a domain if it defines the boundry of the domain.
A device is a maintenance intermediate point (MIP) if it is in the middle of a domain. PE3 and PE4 are MIPs for the level 4 domain because they are part of the domain but do not define the boundry. A MIP does not generate continunity check messages, but it does respond to link trace messages, so you will see the MIPs in the output of a link trace.
Each MEP has a direction, which can be a little confusing the first time you see it. The direction has to do with the relationship of the interface on which CFM is enabled to the interface used to reach the remote MEP. The CE interfaces face directly “inwards” towards the PE interfaces, towards the service, and towards the remote MEP (the other CE). The CEs are considered MEP “down” because they send CFM messages directly out the interface on which CFM is enabled, onto the wire.
The PE interfaces face “outwards” towards the CEs, not “inwards” towards the service. CFM messages on the PEs must be bridged internally and sent out the upstream interface, not sent directly out the wire of the interface on which CFM is configured. The PE interfaces are MEP “up” because they have to internally bridge the CFM frame and then send it upstream to the remote MEPs for level 2 and level 4.
Now that we understand the purpose of CFM and the terms involved with this protocol, let’s explore it in the lab. CFM does not appear to be functional on XRv or XR9000v, so we will only configure CFM on CSR1000v.
This lab uses an E-Access configuration with two service providers. Vlan 1834 is used on the NNI. I will not include the full configurations, as the IGP/LDP setup is very basic. Instead I will include the four UNI and NNI interface configurations. If you need help with understanding E-Access, visit the blog post on that topic.
First we’ll configure CFM on the customer routers, CE1 and CE2.
The commands ethernet cfm ieee and ethernet cfm global enable CFM using the IEEE standard globally. Next we define the CFM domain as “CUSTOMER” for level 7. The domain name is case sensitive and needs to match on all maintenace end points (MEPs) in the domain. To identify this particular service on the domain, we can use a number, vlan-id, or any string. In this case I choose to use a service number of 100. We then enable CFM on the physical interface for the CUSTOMER domain. We need to give each device its own unique maintence point ID. CE1 is MEP ID 111 and CE2 is MEP ID 222.
Both CEs should automatically discover each other. You should see CE2 as a remote MEP on CE1, and vice versa on CE2.
By default the continuity check frames are sent once every 10 seconds. We could increase this to as frequently as once per 3.3msec.
Here is a pcap of the Continuity Check Message sent from CE1:
Notice that the domain name string “CUSTOMER” is seen, as well as the service number. The service number 100 is the “Short MA name” in hex. MA stands for maintenance association. A MA is the particular service inside the maintenance domain. If this doesn’t match, two devices will not see each other as part of the same domain/MA.
We’ll now configure the Provider and Operator domains. PE1 and PE6 will be MEPs for both domains. They each terminate the Provider domain, and also terminate their own Operator domain. Although both Operator domains have the same name and level (4), they are completely independent because they terminate within each operator’s own network.
Note that we use a named EVC above because we are enabling CFM on a service instance as opposed to a port, as we did on the CEs.
You should see the remote MEP come up on each router:
Interestingly, PE1 sees PE6 as a remote MEP for both levels, including the Operator domain.
This is because we have not configured PE3 yet. Let’s do that now. PE3 and PE4 will be a MEP for level 2 but a MIP for level 4. A MIP responds to CFM messages at the configured level but is not an endpoint for that level, so it will not generate continuity messages nor be part of the maintenance association. (Only MEPs are in the maintenance association). We only need to define the Operator domain on PE3 and PE4:
PE1 now sees PE3 as the remote MEP for the Operator domain. This is because the domain is now being terminated at PE3 before it can “reach” all the way down to PE6.
We can now use a CFM link trace message to see the MEPs and MIPs in the Provider domain. Because PE3 and PE4 are MIPs for level 4, they respond to the link trace.
When CE1 traces to CE2 we see nothing, it times out. PE1 does not forward the trace message, I believe because it needs to be a MIP for level 7. PE1 currently forwards continuity checks and loopback messages, because they are at a higher level (7) than its interface, but does not forward traces, apparently because it is not a MIP for level 7.
If we make PE1 a MIP for level 7, the trace now shows a single hop, which is PE1. Now PE3 is not forwarding the trace along.
This brings up the “brain” MAC. The MAC 001e.bd28.12bf is a brain MAC on PE1, which we can confirm from the output of the ethernet cfm statistics:
A MIP responds to trace messages with its BRAIN MAC instead of a physical interface MAC. I believe the idea behind this is that the BRAIN MAC is tied to a bridging function.
Let’s remove MIP level 7 on PE1 and assume that the operators do not want the customer to be able to probe their devices. Let’s also disable MPLS on P5 to cause a fault in the network that we can diagnose.
On CE1 we see some logging messages indicating that the continunity messages are no longer being received:
CE1 cannot trace to isolate the problem, so they call Operator 1 to report the issue. Operator 1 can quickly rule out their network as the problem, as they can “ping” and trace end-to-end within their own level 4 domain.
This is the power of CFM. You can very easily and quickly preform layer 2 troubleshooting and ensure that traffic forwarding is working in your own isolated domain.
Operator 2 tries tries to run an ethernet ping from PE4 to PE6 but it times out. Operator 2 can now troubleshoot their network to remediate the issue.
https://www.youtube.com/watch?v=BpXrDl68ArI&ab_channel=SPAGCisco
https://youtu.be/vdFc_5WNsxg?t=75
After about 9 mins it becomes specific to Extreme Networks
https://www.youtube.com/watch?v=D6yG8KzGsYw&ab_channel=NANOG
Very readable whitepaper from Cisco on CFM for IOS
Short intro to CFM and terms
https://sites.google.com/site/amitsciscozone/network-management/ethernet-cfm
Good short explanation of CFM and terms but uses Alcatel configuration examples
https://www.fujitsu.com/downloads/TEL/fnc/whitepapers/EthernetService-OAM.pdf
General technical whitepaper on OAM for Carrier Ethernet