Installing and Using tcng under Debian GNU/Linux

Jason Boxman

Revision History
Revision 0.1 20040410
Initial draft
Revision 0.5 20040422
Added examples, explanations
Revision 0.9 20040425
Spelling corrections, conclusions
Revision 0.91 20040512
New version of tcng released, 9m

Abstract

Introduction to building, configuring, and using tcng for basic traffic shaping with the HTB and SFQ queuing disciplines.


Table of Contents

Preamble
Configuring Linux with QoS support
Obtaining, Configuring, and Compiling tcng
Writing Shaping Configurations With tcng
An Actual Real World Example
Monitoring
Aftermath
Links and Resources

Preamble

Traffic shaping on Linux has been possible for ages, but the tools and syntax have always been arcane and the machinery behind them magic. In 2001, Werner Almesberger began work on tcng, a replacement for the arcane and poorly documented tc command that comes bundled with the iproute2 package used for traffic control in conjunction with the Linux kernel. What follows is an attempt to demystify traffic control with tcng, including installation, configuration, and monitoring.

Configuring Linux with QoS support

Before you can actually use tcng to do any actual shaping and policing, you need to ensure the kernel on your target system, usually a router or firewall, has the appropriate support available. For a 2.4 series kernel, you will want to enable the following options under Networking options -> QoS and/or fair queueing:

[*] QoS and/or fair queueing
<M>   CBQ packet scheduler (NEW)
<M>   HTB packet scheduler (NEW)
<M>   CSZ packet scheduler (NEW)
<M>   The simplest PRIO pseudoscheduler (NEW)
<M>   RED queue (NEW)
<M>   SFQ queue (NEW)
<M>   TEQL queue (NEW)
<M>   TBF queue (NEW)
<M>   GRED queue (NEW)
<M>   Diffserv field marker (NEW)
[*]   QoS support (NEW)   
[*]     Rate estimator (NEW)
[*]   Packet classifier API (NEW)
<M>     TC index classifier (NEW)
<M>     Routing table based classifier (NEW)
<M>     Firewall based classifier (NEW)
<M>     U32 classifier (NEW)
<M>     Special RSVP classifier (NEW)
<M>     Special RSVP classifier for IPv6 (NEW)
[*]     Traffic policing (needed for in/egress) (NEW)

The selections for a 2.6 series kernel are identical to those listed above, under Device Drivers -> Networking support -> QoS and/or fair queueing.

Obtaining, Configuring, and Compiling tcng

You can fetch the latest version of tcng, version 9l at the time of this writing, from its Web site.

The build process will require a few additional packages. You will need a tarball of a recent Linux kernel version from the 2.4.x series. You will also need a copy of the iproute2 package. You can fetch iproute2-2.4.7-now-ss010824 from an archive. tcng uses some of the files from both packages to build a functional emulation of Linux's traffic control system, utilized by its tcsim simulator.

With those files at your disposal, unpack your tcng tarball in your favourite location. Then, execute configure with the paths to your kernel sources (need not be your running kernel) and the iproute2 package.

jasonb@faith:~/src/tcng$ ./configure -m -k /usr/src/linux-2.4.22.tar.bz2 \
  -i ~/src/iproute2-2.4.7-now-ss010824.tar.gz
Extracting files from /usr/src/linux-2.4.22.tar.bz2 ...
(may yield a few "Not found in archive" messages)
Extracted to ./linux-2.4.22/
Extracting files from /home/jasonb/src/iproute2-2.4.7-now-ss010824.tar.gz ...
(may yield a few "Not found in archive" messages)
Extracted to ./iproute2/
building tcsim:      yes
Kernel source:       /home/jasonb/src/tcng/./linux-2.4.22/
Kernel version:      2.4.22
iproute2 source:     /home/jasonb/src/tcng/./iproute2/
iproute2 version:    010824
Host byte order:     little endian
tcc command:         /home/jasonb/src/tcng/bin/tcc
YACC is:             yacc
$ is not identifier: -std=c99
building the manual: yes
install directory:   /usr/local

Once that process is complete, run make. On Debian GNU/Linux, and possibly other platforms, at the time of this writing there's a bug and your build will die. If you install the latest version, which is now 9m, things should compile fine without any modification.

In file included from /usr/include/bits/sigcontext.h:28,
                 from /usr/include/signal.h:326,
                 from tcsim.c:15:
/usr/include/asm/sigcontext.h:79: error: parse error before '*' token
/usr/include/asm/sigcontext.h:82: error: parse error before '}' token
make[2]: *** [tcsim.o] Error 1

Nuutti Kotivuori posted a solution on LARTC. Add #define __user and #define __kernel below. The file only exists after you have run through make and it fails.

faith:~$ cat ~/tcng/tcsim/klib/include/linux/compiler.h
#ifndef __LINUX_COMPILER_H
#define __LINUX_COMPILER_H

#define __user
#define __kernel

Now, run make a final time. Everything should build correctly.

Now, you will want to run a make test to verify everything is working correctly. It'll take five or ten minutes for all the tests to run. All tests should pass.

jasonb@faith:~/src/tcng$ make test
...
Passed all 1534 tests (24 conditional tests skipped)

You can run the tcc binary directly from the tcc/ directory or run make install.

Writing Shaping Configurations With tcng

Using tcng is rather easy. You express rules in a C style syntax. You define classes of traffic, then you define the traffic shaping rules for the classes you defined. You compile your configuration into standard tc syntax using the tcc bundled with tcng. It will pipe the ruleset to STDOUT by default and you can pipe that directly to your shell of choice to execute the rules.

What follows is a simple example using the fifo queuing discipline, or qdisc for short.

 dev "eth0" {
  egress {
    fifo;
  }
} 

First, the device to operate on is specified with the dev keyword. You can use any valid interface name. Next, we specify what we want to do, which is setting up shaping of egress traffic, traffic leaving the network, on the given interface. Finally, we specify our qdisc. fifo is classless, so there's nothing else to do. We're done.

So what's all that look like as actual input for the tc command? Let's run tcc against it and take a look at the output.

$ tcc article2.tc
...
tc qdisc add dev eth0 handle 1:0 root dsmark indices 1 default_index 0
tc qdisc add dev eth0 handle 2:0 parent 1:0 pfifo

The result, as you can see from the output above, is simply regular tc commands in the appropriate order. Ordinarily you'd pipe this to a file have your shell process each command.

Now, let us look at another example using a classful qdisc.

 /* Include some common files */

#include <ports.tc>
#include <fields.tc>

dev "eth0" {
  egress {

    /* Define a class for IMAP traffic */
    class( <$mail> )
      if tcp_sport == 143
    ;

    /* Define a class for Web traffic */
    class( <$web> )
      if tcp_sport == 80
      if tcp_sport == PORT_HTTPS
    ;

    /* Configure our classful qdisc */
    prio {
      $mail = class {
        fifo( limit 100pcks );
      }
      $web = class {
        fifo( limit 20kB );
      }
    }
  }
} 

Now, as we showcase quite a few tcc language constructs, things start to get interesting. We start off with some C-style comments, which can span multiple lines if desired.

Next, you can include external files using the usual #include syntax. The first included file contains the standard IANA port assignments. The second included file defines commonly matched fields in the header of IP packets and the TCP, UDP, ICMP, and IGMP packets they contain for both IPv4 and IPv6. Both these files are, however, included by default. You need not ordinarily include them in your own configurations.

Next, we define two classes of traffic, using TCP header matches from the fields.tc we included earlier. We match on the TCP source port, since we're interested in shaping traffic leaving for services which we offer. From the configuration above, presumably we're offering IMAP and HTTP services to the world or a busy corporate LAN.

From these classes, it should be clear that you can create arbitrarily complex expressions using the usual C operands, && || !, in their usual order of precedence. You can also utilize bitwise operands, which comes in handy specifically when dealing with packet header matching. Detailed examples can be found in the fields4.tc that came with your tcng tarball.

Each class definition references a forward declared variable, which is defined later. These are referenced inside a class definition. Generally, these matching rules will be of the form if field == value. The entry on the left side of the operand should match some portion of a packet and the entry on the right is usually either a decimal value or some bitmask or bitwise operation. Generally it will simply be a port number or an IP address. These rules are synonymous to using Netfilter to mark packets. The actual shaping definitions take place next. Egress traffic will be matched against each of your classes and their rules in the order in which you define them. Ordering counts.

Last, we specify our queuing discipline. This time, it is more complex as we're using the classful prio qdisc. Now, the actual variable assignments are made. Each variable used in our two classes of traffic above must now be defined. Each definition is simply the variable in need of a definition followed by its assignment to the class keyword. Above, we simply attached the fifo qdisc to each class, respectively. The limit parameter was used, followed by an argument of either packets or bytes, to actually perform some basic, albeit probably useless, shaping for this example configuration.

In tcng speak, the above process is called class selection path, and it simplifies traffic shaping rules by allowing you to bundle what kinds of traffic you want to match in various classes, then define the traffic shaping rules to be allowed to those classes.

An Actual Real World Example

Now let's examine an actual configuration file I use in production.

The scenario is a common one forwhich many a traffic shaping script has been written. I have an ADSL connection rated at 1.5Mbps/256Kbps. As egress becomes saturated, the link becomes increasingly sluggish because important packets, like those for interactive applications like SSH and TCP ACKs, are unnecessarily delayed. Fortunately, tcng provides for a clean, intuitive way of defining which kinds of traffic are important and providing them with the necessary level of priority to keep connectivity snappy, even when the link is heavily conjested.

Now we will look at the configuration a piece at a time.

 dev "eth0" {
  egress {

    class( <$ack> )
      if ip_len < 64 &&
      ip_hl == 0x5 &&
      (raw[33].b >> 4) & 0xff
      if icmp_type == 0 || icmp_type == 8
    ; 

Let's start with the $ack class, as it shows the full power of packet matching in tcng. The u32 classifier for tc is used beneath, so nearly anything is possible. The field names used are all defined in the file fields4.tc which is automatically included when you run tcc. The specific fields available are spelled out in detail in the tcng manual available online. The first rule matches IP packets no larger than 63 bytes with no IP options set and the TCP ACK flag set. While it is easier to simply specify if tcp_ACK I found that not to work. So, I used (raw[33].b >> 4) & 0xff instead. Briefly, it snags byte 33 from the packet, shifts 4 bits, then performs a bitwise on the result. The last rule matches ICMP echo and echo-reply packets.

     /* I gave outbound SSH sessions a max payload */
    /* of 255 bytes.  Good enough. */

    class( <$interactive> )
      if tcp_sport == 143 || tcp_sport == 993
      if tcp_sport == 22 && ip_tos_delay == 1
      if ip_len < 256 && tcp_dport == 22
      if tcp_dport == 53 || udp_dport == 53
      if ip_len < 512 && tcp_dport == 80
      if tcp_dport == 6667 || tcp_dport == 7000
      if tcp_dport == 5190
    ; 

$interactive, as you might suspect, specifies many types of traffic which might be considered interactive. You might recognize IMAP, IMAPS, SSH, DNS, HTTP, IRC, and AIM. Rules that match on the TCP destination port are for traffic originating within my internal network that I do not want delayed, like HTTP requests and DNS requests. Care is taken to prevent bulk egress traffic from being included in this class. Due to some strangeness with tcc I find that matches on the length of IP packets must be done first, or the rule fails to match anything or it matches everything. This problem may have been fixed in version 9m and above, but I have not verified this personally.

    /*
    466[26] mlnet
    6882 bittorrent
    6346 gnutella
    */

    class( <$p2p> )
      if tcp_sport == 4662
      if udp_sport == 4666
      if tcp_sport == 6882
      if tcp_sport == 6346
    ;

    class( <$def> )
      if 1
    ; 

$p2p is for peer2peer traffic that likes to absorb as much bandwidth as it can find and thus tends to negatively effect other egress traffic by drowning it out.

Finally, $def is a generic rule to match any traffic not yet matched. This class is last because each class and its associated rules will be looked at by Linux's QoS code in order from first to last defined. Order counts in class selection.

Now, let us look at the actual shaping configuration.

    htb() {
      class ( rate 192kbps, ceil 192kbps ) {

        /* I think thats the max for ACK packets */
        /* Assuming I saturate my 1.5Mbps link */

        $ack = class( rate 64kbps, ceil 192kbps ) {
          sfq;
        }
        $interactive = class( rate 112kbps, ceil 192kbps ) {
          sfq;
        }
        $p2p = class( rate 8kbps, ceil 192kbps ) {
          sfq;
        }
        $def = class( rate 8kbps, ceil 192kbps ) {
          sfq;
        }
      }
    }
  }
} 

I choose the htb classful qdisc as my queuing discipline. Its a simple, but powerful qdisc that's perfectly suited to the task at hand. Each htb qdisc is given a rate and a ceiling in bits per second. This defines how much bandwidth is available. If not specified, ceiling defaults to whatever you specified rate to be. The outer qdisc above serves as a contain for its children and should have a rate equal to the practical amount of bandwidth available.

Next, I defined a class for each class of traffic defined earlier. If you choose a value for ceiling less than your rate, the qdisc can borrow up to as much your ceiling value. Its important to note that unlike borrowing something in meatspace, bandwidth borrowed in traffic shaping is not returned to its original owner. Because rate is the bandwidth guaranteed to the class, it cannot exceed the number specified for the outermost qdisc. If a particular class requests bandwidth greater than its available rate, it will borrow it based on the proportion of its rate to its parent, the outermost qdisc definition. So, htb qdiscs with a higher rate in comparison to other siblings can borrow more. The careful reader will not that, if your connection is fully saturated, there will be no bandwidth available to borrow from other qdiscs, so choose a reasonable rate for your classes.

Finally, I attached the sfq qdisc as a subclass for each htb qdisc. In brief, it assigns traffic to various internal fifo queues, based on repeatedly applying a hashing algorithm against the traffic, which it then releases packets from in a round robin fashion. This behavior makes it a popular leaf qdisc to apply to shaping configurations.

Monitoring

At some point I'm going to write a Munin plugin to handle this.

Aftermath

I have been using tcng for three weeks now. I have played with varying configurations. I have found that, often, trying to match packets by ip_len fails. That makes matching packets of a particular size impossible. This problem may been corrected in the recently released version 9m. I have also found that peer2peer traffic is pervasive, and simple, static port classifications often do not work. To continue using tcng I will probably need to give up on trying to match TCP ACK packets and let all file sharing traffic get dumped into my default class, as it often defies matching based on port alone. I find such a solution inadequate, so I will likely be exploring the usage of the Netfilter classify and ipp2p extensions instead of continuing to use tcng. It also seems L7 filter might be a nice alternative to ipp2p for 2.6 series kernels.

Links and Resources