Fourth Progress Report on the HICID Project,

September 1, 1998 - November 30, 1998.

Panos Gevros, Fulvio Risso, Peter T. Kirstein and Jon Crowcroft

December 2, 1998

 

  1. Introduction
  2. LEARNET is now available, and we have made some progress in installing a QoS testbed on it. The current configuration is shown and discussed in Section 2. We still need certain changes before the configuration is really suitable for the algorithmic testing we would like to perform. We have continued to work with our laboratory testbed, and have largely completed some simple measurements of algorithms. In Section 3, we describe our local QoS testbed, and the measurements we have been doing. Our progress with IPv6 is considered in Section 4. Our future plans are discussed in Section 5.

  3. The Current LEARNET QoS Testbed

We have installed a CAIRN PC Router at Essex U, and would be ready to do QoS activities if only the configuration was suitable. The current configuration is shown in Fig. 1:

Here the darker shapes indicate the current installation; the lighter circles indicate additional routers, which are planned for the future.

There are two problems with the current Fig. 1 installation:

Each of these is considered in turn

The UCL-CS (UCL-CS-P) and the Essex U (Ess-P) CAIRN routers are connected directly to ATM switches. The ATM switches at UCL-EE (UCL-EE-S), UCL-CS (UCL-CS-S) and BT (BT-S) are also connected together directly. Finally the three CISCO routers (UCL-CS-C, BT-C and Ess-C) are all connected to their local switch. Unfortunately, BT-S is connected to Ess-C rather than to Ess-S. This gives the requisite connectivity. However it also requires that any QoS traffic path from sources attached to Ess-P destined to UCL-CS-P must pass through a CISCO (Ess-C). This means that any QoS algorithms supported by the CAIRN routers and not the CISCO ones cannot really be tested. It would be far better if a single mode ATM port could be provided on Ess-S. If this were done, the first improvement would be:

  1. Re-terminate LEARNET on Ess-C.
  2. For many purposes, we would like to investigate also multiple hops and multicast. Several extensions to Fig. 1 would enrich the topology:

  3. Connect a CAIRN PC (UCL-EE-P) router to the ATM switch UCL-EE-S;
  4. Add a CAIRN router (BT-P) at BT.

 

If all the above were done, it would be possible to establish ATM VCs between any sets of CISCOs or of CAIRN routers. Thus the QoS experiments could be done entirely in the CISCO or CAIRN PC domains, merely by establishing VPNs.

Until some of the above are done (particularly (1)), we can use only QoS algorithms common to both CISCO and CAIRN routers.

  1. The Laboratory Testbed
    1. Introduction
    2. We have extended our tests on our laboratory, small-scale, testbed to experiment with QoS. In our experiments we use research prototypes (mainly but not only those under the altq framework) and measure how effective the traffic management mechanisms are. The problems with the altq ATM driver and its capabilities that were highlighted in the previous report have now been resolved; we are using the latest FreeBSD Releases (2.2.6/7) and the altq-1.1.1 distribution.

      In the CAIRN, our colleagues are starting to use also another research prototype from Carnegie Mellon University, the Hierarchical Fair Share Curve (HFSC) used for link sharing (like CBQ). There are some advantages in this system - not the least that other colleagues are exploring its use, and CMU has offered to configure it onto our routers (it is not as well finished a product as altq from the viewpoint of easy configuration). We intend to include HSFC in future tests. The main part of this report is our experiences with CBQ; these are given much more fully in [1], but the salient points are reported below.

    3. Test environment
    4.  

       

      Here Ammon and Kiki are PCs running FreeBSD, Truciolo is a PC running Microsoft Windows, and Thud is a Solaris Sparc-5.

       

      The main test environment consists of two PCs (Ammon and Kiki) running Unix BSD connected with an ATM 155Mbps link: one runs the CBQ daemon and the other acts as a network capture. The TTCP package is used as Traffic Generator: this program runs on Truciolo, which also runs a CBQ daemon. This double location is basic because the switched Ethernet between the Win95 machine and the CBQ can affect the results in high-bandwidth tests. The TTT package is used as traffic monitor, in certain cases running directly on the second BSD machine, in certain others running the TTTProbe on the BSD machine and the TTTView on the Solaris machine. Since the TTT graphical interface uses a lot of CPU resources, the second option is used to avoid a CPU overload when running some high-bandwidth tests.

       

      Name

      Machine Type

      Network

      OS

      Other Packages installed

      Task

      Kirki

      AMD K6/200, 64MB RAM

      1 Ethernet 10Mbps, 1 ATM 155Mbps

      PVC setting: At various speed (generally between 1 and 10 Mbps)

      UNIX BSD 2.2.7

      ALTQ 1.1.1

      TTCP (recv)

      TTT

      TTTProbe

      Network Capture; in certain cases also Network Monitor

      Ammon

      Intel P166, 32MB RAM

      1 Ethernet 10Mbps, 1 ATM 155Mbps

      PVC setting: At various speed (generally between 1 and 10 Mbps)

      UNIX BSD 2.2.7

      ALTQ 1.1.2

      TTCP (send)

      CBQ daemon

      In certain cases, also Traffic Generator

      Thud

      Sun Spark 5

      Ethernet 10Mbps

      Solaris 2.5.1

      TTTview

      Network Monitor

      Truciolo

      Intel PII-266

      Ethernet 10Mbps

      Windows 95

      TTCP (send)

      Traffic Generator

       

    5. Static tests

Static tests help to discover the static behaviour of the CBQ package. Main goals are:

      1. Bandwidth class allocation tests

These tests show the granularity of the classifier. Results show that:

 

Finally, the CBQ root bandwidth set in the configuration file and reported by CBQSTAT program is:

 

Further details are given in [], but a summary of test characteristics are:

 

      1. Classifier and bandwidth share tests

This test suite has a very simple configuration: a root class of 2Mbps has two leaf classes with an 80% and 20% share. Classes are isolated: anyone is allowed to borrow from the root class.

 

The bandwidth share test uses a very simple classifier with mixed TCP and UDP traffic. In some versions it is based on only the protocol type; in others one includes the destination port field as the classifier filter. These tests have been repeated with different classifier settings.

 

With these tests we have the following results:

      1. Class allocation with different packet size (p_size)

The CBQ mechanism uses the "mean packet length" concept to determine how many packets must be forwarded in each class. The goal of another set of tests is to show the CBQ class allocation when flows with different payload size (some time much smaller than the MTU) are sent to the router.

 

Results confirms that CBQ can be sensitive to the packet size, because it uses a "medium packet size" to compute one sensitive parameter. The CBQ, if not differently specified, sets its packetsize parameter to the MTU link layer, supposing that application tends to generate packets with the MTU length. However this is not true for all application, especially for multimedia audio packets, but for some TCP flows as well (it depends on the MSS). This effect can be limited setting an appropriate packetsize parameter for a certain class in the CBQ configuration file; however this means that we must know in advance the medium packet size of the data carried in that class.

 

A typical experimental result is shown below:

 

 

 

This graph reports the results of some different UDP/TCP flows, clearly showing the deterioration of throughput with packet size. In this test UDP is more affordable than TCP because it is simpler to generate fixed size packets. With TCP flows, there can be some window management problems; in certain cases the TCP throughput is considerably less than the UDP one or much bigger. Equivalent results to UDP performance can be seen with TCP flow if, after starting the CBQ daemon, the MTU size is chosen carefully. Other tests show that this bad behaviour can be avoided setting an appropriate packet-size parameter. Vice versa increasing the max-burst parameter has no effect because this parameter works on the weighted round-robin mechanism. In other word, increasing max-burst, the scheduler is able to send more packets only if the estimator permits this, i.e. only if the class is not overlimit. But, since choosing the wrong packet-size parameter the class becomes overlimit even if it has transmitted only a few bytes, clearly increasing the max-burst cannot be effective.

    1. Dynamic tests

Dynamic tests help to discover the instantaneous throughput of each connection in the CBQ router. Basically, the main goal is to test the borrow mechanism that is very important in CBQ. Since CBQ not a work-conserving discipline, it can happen that some connections have a backlog even when the output link is idle. This can be avoided by configuring the CBQ to use the excess bandwidth activating the "borrow" flag in some classes. In this way a class is allowed to borrow from its parent if the parent has unused bandwidth. Clearly the behaviour of this mechanism must respect the bandwidth share imposed in the configuration files.

 

We tested a number of different CBQ configurations as

 

 

 

 

Fig. 6 (a) Fig. 6 (b)

 

    1. Fig. 6(c) Fig. 6(d)

In (a) the traces show that, when both flows are active, the bandwidth share is correctly reserved. However, the slow flow is not able to use all the parent bandwidth when the fast flow is off. Even if the fast class (when alone) is able to consume much more bandwidth than the small class, it does not use the total parent bandwidth. When both classes are concurrently transmitting the total traffic is a little bigger than the fast class traffic alone. This result is independent of whether the traffic in each class is TCP or UDP.

In (b), TCP and UDP perform differently. TCP flows are not able to use well the borrow flag. Due to its "fair" behaviour and to the CBQ characteristic of sending packets in burst, if more than one TCP flow is sharing a link with the borrow flag it adapts each other. The result is that if three TCP flows are present, they share equally the bandwidth. Even if all three flows starts at the same instant with different throughput, they adapt themselves quickly to share equally the bandwidth. Vice versa, the UDP flows are able to share the bandwidth; the shares may not be exactly that imposed (e.g. 10-40-50% in one experiment), but are able to transmit with different rate.

In (c) the TCP session always receives a better service than the UDP one. The better performance of TCP flows is due to the fact that the borrow mechanism does not work well if classes have different packet size.

In (d) one set of tests was performed with only two TCP flows, one in the agency1 slow class, and the other in the agency2 one. Results show that the two TCP flows share the total bandwidth equally (4Mbps), despite their different agency shares. When one TCP flow is replaced by a UDP one, the results are different. When no other flows are on, the TCP flow, belonging to the bigger agency, is able to use the total link bandwidth (differently from the previous test). Vice versa the UDP flow, belonging to the smaller agency, is not able to do this. When both flow are on, the UDP flow bandwidth remains unchanged and uses more bandwidth than the TCP flow.

    1. CBQ with RSVP
    2. CBQ can co-exist with RSVP: when the RSVP daemon accept a new connection, CBQ dynamically creates a new class in its queuing hierarchy. The CBQ daemon starts automatically when the RSVPD is activated, loading the standard configuration file. Only two classes are defined, one for best effort and the other for reserved traffic. The latter is characterised by the keyword "cntlload", and is the parent class of all the reserved sessions. The suggested configuration allows only the best effort traffic to borrow from the root class (the same feature is disabled for the control_load class). Reserved sessions class can belong only to the Controlled Load service: Guaranteed Service is not supported, and all sessions requiring a Guaranteed Service are refused. When the RSVP daemon accepts a new reservation, the CBQ mechanism creates a new leaf class reserving to it the bandwidth indicated in the reservation message by the token rate r parameter. In the CL class the receiver does not specify the peak rate. These leaf classes are allowed to borrow from their parent class (cntlload class).

      1. Test configuration
      2. For several practical problems, the reserved flow originates from Ammon to Truciolo, via Kirki. However the only reserved link is the ATM link from Ammon to Kirki (the second hop, from Kirki to Truciolo, is not reserved). The first series of tests involve only two reserved flows, TCP and UDP, while the second series adds to them a third UDP flow (belonging to the Best Effort class) from Ammon to Kirki. Graphs are performed capturing the data traffic on the Kirki ATM interface (with the tttprobe program), and showing the graphical result (with tttview) on Thud.

         

      3. Test Results

      From these experiments, we can conclude that the integration of the CBQ and RSVP is very good. Unfortunately some problems arise, but these are not due to the RSVP but to the CBQ mechanism, especially to the borrow implementation. A provider that would like to adopt RSVP with CBQ must take care that the current limitation of the CBQ does not affect its performance.

    3. CBQ Performance

The CBQ mechanism consists of a classifier, an estimator, and a packet scheduler. Regarding its performance, we can say the following:

There is little point in trying to understand the CBQ latency, because usually the time a packet spends waiting in the output buffer is much bigger than the time spent in the CBQ mechanism. Even if the CBQ latency increases due to the increased number of classes, the bigger part of the time that a packet spends in a router is in the output buffer. So, if the goal is to limit the end-to-end delay, limiting the buffer length is more effective than decreasing the scheduling overhead.

However the maximum performance in terms of packets/second is an interesting metric. Anyway it is easy to derive, from the packet per second measurement, the latency of each packet in the router machine.

      1. CBQ Throughput (perf)

It is not easy no determine the CBQ throughput, for a number of reasons. First, sending TCP traffic would be better because it does not waste CPU cycles. However it is not trivial to impose a TCP packet size; it can be done only by imposing a packet size is on the socket buffer: however this may overload the CPU. Moreover the path latency can affect the overall throughput.

For these reasons, we chose to use UDP flow, but care must be taken. UDP flows do not adapt to the network load; hence, to avoid overloading the router CPU, the link between the sender and the router must be set to the appropriate speed. Generally there are done two kinds of test:

The use of an UDP flow means that only the flow from the source to the destination is present in the system; there is no flow from the destination to the source, in contrast to TCP flow.

To measure the system performance, XPERFMON++ is used. However the results are not very accurate, because:

The CBQ configuration for this test is very simple: one class (the default class) with the 100% bandwidth. The bandwidth of the root class is the same as the Kirki – Ammon PVC. No packet size is set, because other tests showed that the packetsize parameter has no effect here.

Clearly the throughput depends on the packet size. With small packet size, it is approximately 15Kpps on an AMD K6-200 machine. When the packet size increases, the overall performance decreases; this is because of the higher load in transferring the packets from the network interface to the memory and vice versa. This result is consistent with measurements made in Sony. It is important to remember that at low packet size, the router machine is running at full load; no other processes receive service: this machine appears like a blocked PC.

      1. CBQ classes overload
      2. This test uses the same configuration as in Section 3.6.1; the goal is to understand the deterioration in the CBQ mechanism when a lot of classes are involved in the system. This test uses 60 MBPS PVC, configured with 10-20-50 and 100 classes. Each class uses 0% bandwidth, except the default class that use the 100%. Each class has a different destination port: this is a very simple filter but quite heavy to compute because every packet must be analysed in depth.

        The results show a clear worsening; when the number of classes increases to 100, the throughput is reduced by 25%. This is not a problem in the normal use of CBQ stand-alone (since the number of classes is limited because is not possible to allocate a decimal percentage of the root bandwidth). In an RSVP environment it may be more serious, because of its ability to create dynamically many classes.

      3. CBQ Memory load

It is important to understand the memory requirement of the CBQ too. In our experiments we used the same configuration files and machine configurations in the above tests. The results from the TOP and the PS program are consistent, and show that the memory occupancy is low. They indicate use of 260 KB of virtual memory for no classes, and around 400B/class. Apparently around 600 KB of real memory is allocated, but we do not really know how the allocation is made.

    1. Evaluation of CBQ

The main problems of the CBQ package seems to be:

  1. Progress with IPv6
  2. It is not yet quite clear how much IPv6 can be used in the HICID/HIGHVIEW/JAVIC projects. It is clear that IPv6 will have much more support for QoS eventually, so we would like to use it for HICID. This requires, however, that the applications we are using would themselves support IPv6. Mainly with effort from outside the project, we are building up an Ip capability for our testbed. Our current progress is summarised below:

    1. Stacks
    2. UCL-CS has put up the IPv6 stack from Microsoft, and is starting with the DASSAULT one for Windows/NT. We have not yet done any inter-working experiments, but already see that they have somewhat different application APIs. We have put up the complete stack in FreeBSD; this is used in CAIRN. We will put up the LINUX one from Lancaster U; we will need this for our mobile work but not for HICID. There was a set of patches for Solaris 2.5 and Solaris 2.6. We were advised to wait for the Solaris 7 version. We have now received it, and have installed it on two of our workstations.

    3. Routers
    4. The CAIRN and CISCO routers we are using both support IPv6; there is no problem at that level. However the QoS support in the routers is much more limited. The CAIRN router supports only HFSC; it does not yet support ALTQ under IPv6. There is a version of ALTQ distributed by INRIA; this has not been evaluated yet by any in the CAIRN community - including UCL. The CISCO support is also much more limited under IPv4 than IPv6.

    5. Applications

    A particular problem is that most of the applications do not yet work with IPv6. We have made a start by making RAT work above the Microsoft stack. We have now started with making it work above the DASSAULT stack to see the differences in API. We know that there are already ports of VIC and SDR for IPv6; we have not yet investigated these.

  3. Future Work

Our main activities during the next quarter are the following:

 

Reference

1. Fulvio Risso ALTQ Package – CBQ Testing, (GIVE WWW REFERENCE)