QoS Introduction (Part 1)

In this article we will go over the basics of QoS, why we need it, and how it works. In future articles in this series we will explore QoS tools and configuration using MQC (Modular QoS CLI) in detail.

QoS (Quality of Service) is used to treat classes of traffic differently based on their requirements. QoS is frequently used to prioritize low-delay traffic such as voice and video, over delay-insensitive traffic such as web browsing. When there is congestion on an interface, we might want voice packets to get through first, even at the expensive of delaying web browsing traffic.

Therefore you will often hear QoS referred to as “managed unfairness.” In order to give some packets preferential treatments, other packets must be de-prioritized in some way.

Without QoS, all packets are sent out an interface in the order they were received, which is called FIFO (first in first out). This is the default queueing method that you are familiar with in your everyday life. When you wait in line at the DMV, nobody has preferential treatment over anyone else. Each person is seen in the order that they arrived in line.

When a queue has a priority queue, people in the priority line wait for less time than people in the default line. For example, at amusement parks such as Disneyland, a ride might have two lines: a fast-pass lane, and a standby lane. A person entering the fast-pass lane will get on the ride quicker than a person entering the regular standby lane at the same time. The people in the standby lane are actually slowed down because of the presence of the fast-pass lane. If there was no fast-pass lane, a person waiting in standby would actually get on the ride faster overall, because they don’t have fast-pass lane people constantly “cutting” in front of them. In queueing in general, in order to give preference to one group, you must negatively affect another group.

Why do we need QoS?

We need QoS in times of congestion, which is when traffic is queued at an interface and waiting to be transmitted. You could argue that you don’t need QoS when traffic is queueing, because it may be perfectly acceptable to deal with FIFO. You might then say that QoS is needed when traffic is queued and this negatively impacts the end users on the network.

Congestion can happen even if bandwidth is not hitting line-rate of the interface when you look at a 30-second load interval. A burst of traffic in a short time period (milliseconds) can exceed the line rate of the interface. This is called a microburst. An interface can only send at line speed, which means that a 1gigabit interface must always transmit packets at the line speed of one gigabit per second. If there is more traffic arriving at the interface than can be transmitted at line speed, the traffic must be either queued or dropped. QoS manages this situation so that traffic is treated intelligently and user experience on the network is not negatively impacted. It’s important to note that QoS does not prevent congestion, it instead manages the congestion so some traffic is treated “better” than other traffic.

Characteristics of traffic, and delay measurements

To understand how QoS works, we need a set of terms to describe characteristics of traffic that QoS affects.

Bandwidth

While it might seem like common sense, it is useful to take a minute to think about bandwidth from a very percise viewpoint. On a given link with no policers, bandwidth is equal to the line rate of the interfaces on either end of a link. If I connect two routers on gigabit interfaces back-to-back, the bandwidth on the link is equal to the line rate, which is 1Gbps.

However, bandwidth can also be defined by a contract. If I receive an L2VPN E-Line service from a service provider with an Ethernet handoff, and connect it to Gi1 on my router, it is possible that the service I am paying for is only a 500Mbps service. Even though the line-rate is 1Gbps, the bandwidth is 500Mbps. The service provider can shape and police the traffic so that I can never exceed 500Mbps, even if I don’t configure QoS on my device. The service provider can simply drop any traffic that exceeds the contracted rate, leaving me with a rate less than line-rate of my device. The contracted amount, 500Mbps in this example, is called the CIR (Committed Information Rate).

Let’s say I want to slow my router down so that the service provider doesn’t drop my excess traffic above 500Mbps. The problem is that the router cannot automatically just transmit at 500Mbps. The line rate is 1Gbps. The router must always send at line rate. In order to effectively send at 500Mbps you apply a shaper which can send at line rate “half the time.” What this means is that every 1/16th second it can send at line rate, and then the next 1/16th second it doesn’t send at all. When you add this up, it sent at 1Gpbs for 8/16 of a second, which results in an effective rate of 500Mbps.

An effective rate of 500Mbps is achieved by transmitting at line rate half the time.

It is also useful to note in this discussion that the bandwidth command under an interface doesn’t actually change anything to do with QoS or the speed at which data is transmitted. The bandwidth command is just used as a reference by different protocols such as OSPF and EIGRP when calculating link metrics. It can also be used by QoS if we configure a policy that uses a percent of the bandwidth on the interface. However, by default, just configuring the bandwidth command doesn’t really do anything. It is more of an informational command than an action command.

When using QoS we can reserve bandwidth for different classes of traffic. For example, we might want to reserve 5Mbps of bandwidth for voice calls, and 20Mbps for video traffic. QoS accomplishes this by queuing these classes of traffic in separate queues, and transmitting packets from these queues in such a way that bandwidth is always available for these classes. You can think of this as the fast-pass lane at a Disney ride as gauranteeing a “bandwidth” of 10 riders per minute. The ride operators can just pull people out of the fast-pass lane at a fast enough rate to meet this “bandwidth,” cutting in front of the standby line. Throughout this series will we take a much deeper look at queuing and shaping.

Serialization Delay

Serialization delay is the time it takes to place an entire payload onto the wire. Serialization delay is affected by the size of the payload and the speed of the link (the link rate). You can picture serialization delay as the time it takes a train to cross a point, such as a train station. The timer would start when the first car of the train passes the point, and the timer would stop when the last car of the train passes that same point. The length of the train (size of the payload) and the speed of the train (the line rate) affect the serialization delay.

Serialization delay is calculated using the following formula:

<bits sent> / <link speed in bps>

A 1500 byte packet sent on a 1Gbps link has a serialization delay of only 0.012ms. The formula is:

1500 * 8 / 1,000,000,000 = 0.000012 = 0.012 msec

These days serialization delay is usually negligible. However, back when T1 circuits were common, a 1500 byte payload would take 62.5 msec to serialize!

Serialization delay is fixed. There is no setting we can configure on the router to change this delay.

Propagation Delay

This is the time it takes the electrical or optical signal in a wire to travel from the source to the destination. This approaches the speed of light (3 * 10^8 m/sec), so generally the formula simply finds the time it would take light to travel the distance between the two endpoints.

<distance in meters> / 3*10^8

It takes approximately 16msec for light to travel 3000 miles. Therefore if you have two offices 3000 miles appart, your latency will always be at least 16msec.

Like serialization, propagation delay is fixed and not “configurable.”

Queuing Delay

This is the amount of time that a packet sits waiting in a queue. This is a variable delay because it depends on whether there are packets built up in the queue currently, which always changes. For each packet that is in front in a queue, it will take at least the serialization delay to transmit the packet. Therefore the minimum queuing delay is:

<number of packets in front in queue> * <serialization delay>

In reality it will likely be more than this, because this does not account for other higher priority queues existing that will have packets placed onto the wire while the current queue is being emptied.

We can affect queuing delay on a router by changing the length of a queue. If we give a queue a length of 40 packets, the last packet in the queue will incur greater queuing delay than if the queue length is only 10 packets. The shorter the queue length is, the less queuing delay is experienced, but more packet drop is experienced. (To put it another way, a long queue results in less packet loss at the expense of more queuing delay. A short queue results in less queuing delay at the expense of more packet loss. Neither is necessarily better than the other.)

Forwarding Delay

This is the amount of time it takes a network device to process the packet. It is measured as the amount of time it takes a frame to be placed into an output queue once it has been received. In modern equipment, this delay is so low that it is usually negligible.

Shaping Delay

This is the amount of time that a shaper artifically slows down the transmission of a packet in a queue, in order to slow traffic down to a CIR. In our example previously, a shaper might slow packets by as much as half in order to transmit at 1Gbps line rate, but effectively send at 500Mbps. The shaping delay is the additional amount of time a packet sits in a queue compared to if it was able to be sent at line rate.

Latency

This term is used to describe the overall end-to-end delay between two endpoints. It can be measured as the time it takes a packet to be sent by a source and received by the destination. This is affected by all the delays mentioned previously - a packet has to be queued, serialized, propagated along the medium, and processed.

RTT (round trip time) is the time it takes a packet to be sent at a point, looped back at the destination, and received at that same point. RTT is the latency in both directions added together plus any processing time at the destination. A ping result is the RTT, not the latency.

Jitter

Jitter is the variation in delay between consecutive packets. If one packet arrives with 20msec latency, and the next packet arrives with 30msec latency, you have experienced 10msec of jitter. Jitter negatively affects real time traffic.

How QoS Works

Now that we understand the basic components that measure traffic characteristics, we can explore how QoS actually works.

QoS works by giving certain traffic preferential treatment over other traffic. This might involve queuing some traffic more quickly than others, dropping some traffic more than other traffic in times of congestion, etc. But how does this actually work?

A QoS policy defines the overall actions you want to take on classes of traffic. A QoS policy is what dictates actions such as “service the voice queue more quickly than the data queue” or “gaurantee voice traffic 5Mbps of bandwidth.” A QoS policy is applied to an interface, usually in the outbound direction and typically on a WAN link.

But how do we define certain traffic as voice traffic, data traffic, high priority traffic, etc? This process is called classification. Typically we use markings on the packets or frames themselves to identify the type of traffic. For example, voice traffic is often marked by the phone itself as EF (expedited forwarding). Another option is that we can classify traffic by inspecting the destination port, or by doing deeper application recognition such as NBAR on Cisco routers.

By understanding this concept you will have a much easier time applying QoS to Cisco routers. The process on Cisco is a three step process:

Classify traffic using class-maps. You can classify traffic based on an ACL (matching layer 3 or layer 4 information), matching markings of the packet (such as EF), or using NBAR and matching the actual application.
Define policy-maps which define the QoS policy for each class of traffic. A policy-map references multiple class-maps and defines the queuing, shaping, bandwidth requirements, drop preference, etc. for each class.
Apply the policy-map to the interface.

We will go into much more detail about applying and configuring QoS in upcoming articles, but I think it is helpful to at least get a general idea of how it works.

In the next article we will look more closesly at QoS tools and QoS models, namely IntServ and DiffServ.