ALTQ Package – CBQ Testing

The goal is to test the ALTQ package to understand the impact on new forms of scheduling (primarily CBQ) on the router performance and on traffic management.

    1. Test environment
    2. The main test environment consists of two PC running Unix BSD connected with an ATM 155Mbps link: one runs the CBQ daemon and the other acts as a network capture.

      The TTCP package is used as Traffic Generator: this program runs on the Windows 95 machine or on the same machine of the CBQ daemon. This double location is basic because the switched Ethernet between the Win95 machine and the CBQ can affect the results in high bandwidth tests.

      The TTT package is used as traffic monitor, in certain cases running directly on the second BSD machine, in certain others running the TTTProbe on the BSD machine and the TTTView on the Solaris machine. Since the TTT graphical interface uses a lot of CPU resources, the second option is used to avoid a CPU overload when running some high bandwidth tests.

       

      Name

      Machine Type

      Network

      OS

      Other Packages installed

      Task

      Kirki

      AMD K6/200, 64MB RAM

      1 Ethernet 10Mbps, 1 ATM 155Mbps

      PVC setting: At various speed (generally between 1 and 10 Mbps)

      UNIX BSD 2.2.7

      ALTQ 1.1.1

      TTCP (recv)

      TTT

      TTTProbe

      Network Capture; in certain cases also Network Monitor

      Ammon

      Intel P166, 32MB RAM

      1 Ethernet 10Mbps, 1 ATM 155Mbps

      PVC setting: At various speed (generally between 1 and 10 Mbps)

      UNIX BSD 2.2.7

      ALTQ 1.1.2

      TTCP (send)

      CBQ daemon

      In certain cases, also Traffic Generator

      Thud

      Sun Sparc 5

      Ethernet 10Mbps

      Solaris 2.5.1

      TTTview

      Network Monitor

      Truciolo

      Intel PII-266

      Ethernet 10Mbps

      Windows 95

      TTCP (send)

      Traffic Generator

       

       

    3. Static tests

Static tests help to discover the static behaviour of the CBQ package. Main goals are:

 

      1. Bandwidth class allocation tests

These tests show the granularity of the classifier. Results show that:

 

Finally, the CBQ root bandwidth set in the configuration file and reported by CBQSTAT program is:

 

So, configuring the CBQ at full speed on an Ethernet interface requires setting the root class bandwidth to 10.000.000 bps; vice versa configuring the CBQ to an ATM interface whose PVC is set to 2Mbps (with the PVCTXCTL command, that it reports an effective connection of 5139 cell/s) the root class bandwidth needs to be set to 5139*8*48= 1973376 bytes.

At the moment it is not clear (to me) if the AAL5 and LLC/SNAP is included in the ATM header (so the root bandwidth includes the AAL5 and LLC/SNAP as well) or not. However, for CBQ purposes, high precision is not important: the calculation of the true value of the root class bandwidth is not of interest.

 

[Tests: class_allocation.*]

      1. To understand the test results details

The analysis of these test results need to remember:

 

The following table explains the main parameters reported in the test results.

 

Test Characteristics

 

PVC Settings

Traffic Type

Throughput

TCP/UDP (KB/s)

Estimated Throughput

TCP/UDP payload

N° buf sent

IP (Kbps)

ATM (Kbps)

ATM (cell/s)

A brief summary of the test configuration: classifier filters and class configuration

The PVC hardware setting (with PVCTXCTL command)

A brief summary of the flow involving the test, with the classes they belong to

Throughput reported by TTCP program. It corresponds to the level 4 throughput (total UDP/TCP traffic divided by the time). It doesn’t include any level 3 (and below) headers

IP (Kbps): is an estimation of the IP traffic, considering all generated packets with a fixed dimension (based on the TCP/UDP payload)

ATM (Kbps): is the corresponding ATM bandwidth estimate, based on this fixed IP dimension

ATM (cell/s): is an estimation of the number of ATM cell/s

Dimension of the TCP/UDP payload. For UDP traffic this is the base of the IP packet construction; for TCP traffic it is not correct because:

1. by default TCP bufferises the data and sends it only when it reaches the MSS

2. the sliding window mechanism force, some time, to send a packet with smaller dimension

Is the number of writing on the underlying socket. Timed by the previous parameter, it represents the number of bytes transmitted

 

In some test it has been used a PVC with different bandwidth (PVCTXCTL command) settings but always with the same root class bandwidth. In brief, these are the real values regarding the different settings:

 

 

Summarizing the test characteristics:

 

      1. Classifier and bandwidth share tests

This test suite has a very simple configuration: a root class of 2Mbps has two leaf classes with an 80% and 20% share. Classes are isolated: anyone is allowed to borrow from the root class.

 

 

 

Bandwidth share test: configuration 1 (proto)

This test uses a very simple classifier (on protocol type), with mixed TCP and UDP traffic.

 

Test Characteristics

Classifier Filter:
Protocol Type

Class Definitions:
0. Root Class (100%, 2 Mbps)
A. Slow Class: UDP (20%)
B. Fast Class: TCP (80%)

PVC Settings

Traffic Type

TCP/UDP

Throughput (KB/s)

Estimated Throughput

TCP/UDP payload

N° buf sent

IP (Kbps)

ATM (Kbps)

ATM (cell/s)

Name: Tcp

 

2 Mbps

A: --

B: TCP

--

166.32

--

1369.15

--

1480.64

--

3492.08

--

8192

--

2048

3 Mbps

A: --

B: TCP

--

166.20

--

1368.15

--

1479.56

--

3489.54

--

8192

--

2048

10 Mbps

A: --

B: TCP

--

178.88

--

1472.54

--

1592.45

--

3755.78

--

8192

--

2048

Name: Udp

 

2 Mbps

A: UDP

B: --

47.83

--

393.16

425.80

--

1004.25

--

8192

--

2048

--

3 Mbps

A: UDP

B: --

47.74

--

392.43

--

425.00

--

1002.37

--

8192

--

2048

--

10 Mbps

A: UDP

B: --

47.76

--

392.59

--

425.18

--

1002.78

--

8192

--

2048

--

Name: Tcp-udp

 

2 Mbps

A: UDP

B: TCP

47.66

153.15

391.76

1260.74

424.28

1363.41

1000.67

3215.58

8192

8192

2048

8192

3 Mbps

A: UDP

B: TCP

47.41

166.92

389.71

1374.08

422.06

1485.97

995.43

3504.66

8192

8192

2048

8192

10 Mbps

A: UDP

B: TCP

47.57

170.21

391.02

1401.16

423.48

1515.26

998.78

3573.73

8192

8192

2048

8192

 

 

Bandwidth share test: configuration 2 (dport)

Basically this is the same test as before, but with the destination port field as the classifier filter.

 

Test Characteristics

 

Classifier Filter:
Destination Port

Class Definitions:
0. Root Class (100%, 2 Mbps)
A. Slow Class: d_port 2000 (20%)
B. Fast Class: d_port 3000 (80%)

PVC Settings

Traffic Type

TCP/UDP

Throughput (KB/s)

Estimated Throughput

TCP/UDP payload

N° buf sent

IP (Kbps)

ATM (kbps)

ATM (cell/s)

Name: Udp-udp

 

2 Mbps

A: UDP

B: UDP

47.39

161.46

389.55

1327.20

432.00

1471.39

1018.88

3471.39

8192

8192

2048

8192

3 Mbps

A: UDP

B: UDP

47.39

157.36

 

389.55

1293.50

432.00

1434.49

1018.88

3383.24

8192

8192

2048

8192

10 Mbps

A: UDP

B: UDP

47.55

161.62

390.86

1328.52

433.47

1473.33

1022.33

3474.83

8192

8192

2048

8192

Name: Udp

 

2 Mbps

A: --

B: UDP

--

167.18

--

1374.22

--

1524.01

--

3594.37

8192

8192

2048

8192

10 Mbps

A: --

B: UDP

--

167.32

--

1326.05

--

1470.59

--

3468.38

8192

8192

2048

8192

Name: Tcp-tcp

 

2 Mbps

A: TCP

B: TCP

46.79

154.51

385.18

1271.93

426.54

1408.51

1005.99

3321.97

8192

8192

2048

8192

3 Mbps

A: TCP

B: TCP

46.72

166.38

384.60

1369.64

425.90

1516.72

1004.48

3577.17

8192

8192

2048

8192

10 Mbps

A: TCP

B: TCP

46.70

172.69

384.43

1421.58

425.72

1574.24

1004.05

3712.84

8192

8192

2048

8192

Name: Udp-tcp

 

2 Mbps

A: TCP

B: UDP

46.78

167.84

385.09

1381.66

426.45

1530.03

1005.77

3608.56

8192

8192

2048

8192

 

 

Bandwidth share test: classifier behaviour (classifier)

Some tests have been repeated with different classifier settings to test if the classifier puts traffic in the right class.

 

Test Characteristics

PVC Settings

Traffic Type

TCP/UDP

Throughput (KB/s)

Estimated Throughput

TCP/UDP payload

N° buf sent

IP (Kbps)

ATM (kbps)

ATM (cell/s)

Name: tcp-tcp_saddr

Classifier Filter:
Source Address

Class Definitions:
0. Root Class (100%, 2 Mbps)
A. Slow Class: s_addr Truciolo (20%)
B. Fast Class: s_addr Kirki (80%)

2 Mbps

A: TCP

B: TCP

39.77

163.62

327.39

1346.92

354.05

1456.60

835.01

3435.38

8192

8192

2048

8192

Name: tcp-udp_daddr

Classifier Filter:
Destination Address

Class Definitions:
0. Root Class (100%, 2 Mbps)
A. Slow Class: d_addr Ammon_ATM (20%)
B. Fast Class: d_addr Ammon_ETH (80%)

2 Mbps

A: UDP

B: TCP

47.66

155.57

391.77

1280.65

424.29

1384.94

1000.67

3266.36

8192

8192

2048

8192

Name: tcp-udp_saddr_dport_proto

Classifier Filter:
Protocol Type, Source Address and Destination Port

Class Definitions:
0. Root Class (100%, 2 Mbps)
A. Slow Class: TCP, s_addr Truciolo, dport 2000, (20%)
B. Fast Class: UDP, s_addr Kirki, d_port 3000, (80%)

2 Mbps

A: TCP

B: UDP

39.16

169.46

322.37

1394.99

348.62

1508.59

822.21

3558.00

8192

8192

2048

8192

 

Results

With these tests we have the following results:

      1. Class allocation with different packet size (p_size)

The CBQ mechanism uses the "mean packet length" concept to determine how many packets must be forwarded in each class. The goal of the following tests is to show the CBQ class allocation when flows with different payload size (some time much smaller than the MTU) are sent to the router.

 

Test Characteristics

Classifier Filter:
Destination Port

Class Definitions:
0. Root Class (100%, 2 Mbps)
A. Slow Class: d_port 2000 (20%)
B. Fast Class: d_port 3000 (80%)

PVC Settings

Traffic Type

TCP/UDP

Throughput (KB/s)

Estimated Throughput

TCP/UDP payload

N° buf sent

IP (Kbps)

ATM (kbps)

ATM (cell/s)

Name: Udp -udp_l1024

2Mbps

A: UDP

B: UDP

36.08

161.77

303.65

1361.46

351.85

1577.58

829.84

3270.71

1024

1024

2048

8192

Name: Udp -udp_l512

2Mbps

A: UDP

B: UDP

30.05

136.42

259.63

1178.67

305.79

1388.21

721.20

3274.08

512

512

2048

8192

Name: Udp -udp_l256

2Mbps

A: UDP

B: UDP

22.28

117.87

202.48

1071.20

264.50

1399.35

623.84

3300.36

256

256

2048

8192

Name: Udp -udp_l128

2Mbps

A: UDP

B: UDP

15.21

92.00

151.86

918.53

206.37

1248.26

486.72

2944.00

128

128

2048

8192

Name: Udp -udp_l64

2Mbps

A: UDP

B: UDP

9.27

62.28

109.16

733.41

188.66

1267.52

444.96

2989.44

64

64

4196

16384

Name: Udp -udp_l32

2Mbps

A: UDP

B: UDP

5.45

37.66

83.71

578.46

147.89

1021.94

348.80

2410.24

32

32

8192

32768

Name: Udp-Udp_l32-512

2Mbps

A: UDP

B: UDP

30.18

36.91

260.76

566.94

307.11

1001.69

724.32

2362.24

512

32

2048

32768

 

Results

Results confirms that CBQ can be sensitive to the packet size, because it uses a "medium packet size" to compute the Idle parameter. Since, when AvgIdle < 0 the class is considered overlimit, if Idle is computed with the wrong packet size the AvgIdle become negative even if the class has transmitted less than its allocated bandwidth.

The CBQ, if not differently specified, sets its packetsize parameter to the MTU link layer, supposing that application tends to generate packets with the MTU length. However this is not true for all application, especially for multimedia audio packets, but for some TCP flows as well (it depends on the MSS).

This effect can be limited setting an appropriate packetsize parameter for a certain class in the CBQ configuration file; however this means that we must know in advance the medium packet size of the data carried in that class.

 

 

This graph reports the results of some different UCP flows, clearly showing the worsening. In this test UDP is more affordable than TCP because it is simpler to generate fixed size packets. With TCP flow there can be some window management problems: in certain cases the TCP throughput is considerably less than the UDP one (if the sockbuffersize is set to N bytes, TTCP –b option), or much bigger (if the TCP buffer sent is set to N byte, TTCP –l option, even with the Nagle algorithm disabled, TTCP –D option). Equivalent results to UDP performance can be seen with TCP flow if, after starting the CBQ daemon, the MTU size is reset to N bytes (where N is chosen carefully).

Other tests (not reported here) show that this bad behaviour can be avoided setting an appropriate packetsize parameter. Vice versa increasing the maxburst parameter has no effect because this parameter works on the weighted round-robin mechanism. In other word, increasing maxburst, the scheduler is able to send more packets only if the estimator permits this, i.e. only if the class is not overlimit. But, since choosing the wrong packetsize parameter the class becomes overlimit even if it has transmitted only a few bytes, clearly increasing the maxburst can not be effective.

    1. Dynamic tests
    2. Dynamic tests help to discover the instantaneous throughput of each connections in the CBQ router. Basically, the main goal is to test the borrow mechanism that is very important in CBQ.

      Being CBQ a non work conserving discipline, it can happen that some connections have a backlog and contemporary the output link is idle. This can be avoided by configuring the CBQ to use the excess bandwidth activating the "borrow" flag in some classes. In this way a class is allowed to borrow from its parent if the parent has unused bandwidth.

      Clearly the behaviour of this mechanism must respect the bandwidth share imposed in the configuration files.

      1. Test configuration
      2. We tested four configuration; all with the ATM PVC and the root class bandwidth of 4Mbps, except the third test that has the PVC and the root class of 2Mbps.

      3. Configuration 1 (borrow1)
      4. In the first configuration a very simple test was used: two flows, one in the fast class and one in the slow class; the second flow starts some seconds after the first. The class configuration is very simple: only the leaf classes are allowed to borrow from it parent class.

        The goal is to verify that when a class is idle the other is able to use all the parent bandwidth.

         

         

        Results

        The traces shows that, when both flows are active, the bandwidth share is correctly reserved. However, the slow flow is not able to use all the parent bandwidth when the fast flow is off. But, even if the fast class (when alone) is able to consume much more bandwidth than the small class it doesn’t use the total parent bandwidth: when both classes are concurrently transmitting the total traffic is a little bigger that the fast class traffic alone.

        These traces are confirmed with different traffic patterns: TCP-UDP, UDP-TCP, TCP-TCP have the same result: if only one flow is present it is unable to use all the parent’s bandwidth, and this is more problematic for the slow_class flow, whose throughput is considerably less then the root bandwidth.

         

         

      5. Configuration 2 (borrow2)
      6. This test involves three agencies, but only two flows are present. In this way the free bandwidth must be shared proportionally among the two active agencies.

         

        Results

        Results shows a different behavior from TCP and UDP. TCP flows (first three graphs) are not able to use well the borrow flag. Due to its "fair" behavior and to the CBQ characteristic to send packets in burst, if more than one TCP flow is sharing a link with the borrow flag it adapts each other. The result is that if three TCP flows are present, they share equally the bandwidth. The first graph reports the behavior with two flows (the agency1 and agency2) that share equally (50%) the link, and the second graph reports the behavior with all three flows, starting in different time: the bandwidth for the first flow is the 100% when only it is present, but drops to the 50% when the second flow starts, and to the 33% when also the third starts. The third graph shows, however, that if all three flows starts at the same instant, they initially have a differently throughput, but they adapt themselves quickly to share equally the bandwidth.

        Vice versa, the UDP flows are able to share the bandwidth (fourth graph): the share is not exactly that imposed (10-40-50%), but anyway the three flows are able to transmit with different rate.

         

      7. Configuration 3 (rsvp)
      8. This configuration uses the default configuration file for RSVP. RSVP creates leaf classes (Session1 and Session2) only and does not affect the behavior of the CBQ mechanism. For this reason these graphs can be seen as an example of the borrow flag without consider their RSVP origin. In this test, very similar to the previous one, the difference is that one flow has bigger packets than the other. Two flows have the same bandwidth allocation, but the UDP flow has 8192 bytes payload while the TCP one has 1460 bytes payload.

        Results

        These graphs show that the TCP session always receives a better service than the UDP one: these differences are more evident in the first two graphs. However there are no differences in the configuration regarding these four plots: so the different behaviour among them is not clear (NOTE: to me L ).

        The better performance of TCP flows is due to the fact that the borrow mechanism does not work well if classes have different packet size. (NOTE: I’m not sure why).

        Moreover, not the whole cntlload bandwidth is used from the two leaf classes. Borrow mechanism seems do not work so well with small classes.

         

         

         

      9. Configuration 4 (borrow3)

In this more complex configuration, each class is able to borrow from its parent. So leaf classes are able to borrow from the "agency" classes, and agency classes are able to borrow from the root class.

This can be a very common configuration in the real world because it is not useful to waste bandwidth: if this is possible (if there is unused bandwidth) is good to avoid any backlogged classes. The only way to avoid this is to permit all classes to borrow when other classes are idle, but this means that we have to be sure that the bandwidth share imposed for each agency will be respected.

 

Results

This test was performed with only two TCP flows, one in the agency1 slow class, and the other in the agency2 one.

Results show that the two TCP flows share the total bandwidth equally (4Mbps), despite their different agency share.

 

 

A similar result was obtained substituting the two TCP flow with one TCP and one UDP. Here results are more stranger, because:

 

Anyway, it seems that the borrow, that is essential in the real world, doesn’t permit to respect the bandwidth share between different agency.

 

 

More analysis on UDP flow (borrow3b)

It’s very strange this UDP behavior, because even it the UDP flow has the same parameter of the TCP (same class, same packet size, …) it seems that with this particular class configuration it performs very bad compared to TCP.

To confirm this strange result, a new test involving 3 machines is done:

Results show that in this test UDP performance are even worse than it the previous configuration: while Juliet is sending UDP flow to approximately 6Mbps, Kirki is receiving the same flow at 1.5Mbps only, but the TDP flow reaches 3.5Mbps. All test are performed with only one flow in the system and all parameter are the same; TCPDUMP traces show that both UDP and TCP have the same packet size (so there are not smaller TCP packet due to some windows management, problem that appeared in previous tests).

Obviously, also the CBQSTAT report is the same: both flows reach the minidle parameter, and the only difference among them is that the UDP flow has always the queue full (30 packets), while the TCP flow usually has no packet in queue.

Performing a post elaboration on the TCPDUMP traces, it can be seen that the UDP flow on the sink router has higher variance compared to the TCP flow: this is not due to the source pattern, because a capture on the source router shows that here the variance is very small.

 

 

The following table reports the medium value, the standard deviation and the variance if the interpacket time among packets.

 

 

Kirki (TCP) - sink

Kirki (UDP) - sink

Juliet (UDP) - source

Medium value

0.016878934

0.042618816 (1)

0.008806352

Standard Deviation

0.002360582

0.058480941

2.51813E-05

Variance

5.56956E-06

0.00341658

6.33939E-10

(1) In this test the medium value has no much significance, because in effect the graph reports that the interpacket time is mainly concentrate around two values: 16.7 and 176 ms.

 

    1. CBQ with RSVP
    2. CBQ can coexist with RSVP: when the RSVP daemon accept a new connection, CBQ dynamically creates a new class in its queuing hierarchy. The CBQ daemon starts automatically when the RSVPD is activated, loading the standard configuration file /etc/cbq.conf. ALTQ package suggests to set this default configuration file:

       

      # CBQ configuration for RSVP

      interface pvc0 bandwidth 2M cbq

      class cbq pvc0 root_class NULL priority 0 admission none pbandwidth 100

      class cbq pvc0 unres_class root_class borrow priority 3 pbandwidth 60 default

      class cbq pvc0 res_class root_class priority 6 pbandwidth 40 admission cntlload

       

      In other word only two class are defined, one for best effort and the other for reserved traffic. The latter is characterised by the keyword "cntlload", and is the parent class of all the reserved sessions. The suggested configuration allows only the best effort traffic to borrow from the root class (the same feature is disabled for the control_load class).

      Reserved sessions class can belong only to the Controlled Load service: Guaranteed Service is not supported, and all sessions requiring a Guaranteed Service are refused (Error 21, code2: Service unsupported).

      When the RSVP daemon accepts a new reservation, the CBQ mechanism creates a new leaf class reserving to it the bandwidth indicated in the reservation message by the token rate r parameter. In the CL class the peak rate is not specified by the receiver. These leaf classes are allowed to borrow from their parent class (cntlload class).

       

       

      1. Test configuration
      2. For several practical problems, the reserved flow originates from Ammon to Truciolo, via Kirki. However the only reserved link is the ATM link from Ammon to Kirki (the second hop, from Kirki to Truciolo, is not reserved). First series of test involve only two reserved flows, TCP and UDP, while second series add them a third UDP flow (belonging to the Best Effort class) from Ammon to Kirki.

        Graphs are performed capturing the data traffic on the Kirki ATM interface (with the tttprobe program), and showing the graphical result (with tttview) on Thud .

         

         

         

        Reserved Traffic only: one TCP and one TCP unicast session

         

        This plot confirms the expected results. If only one TCP flow is present it take approximately 0.5Mbps, and when an other TCP session starts the throughput of each one drops to 0.25 Mbps approximately. The TCP bandwidth is the same, both with one and two sessions.

        However not the whole cntlload bandwidth is used from these two class, as already noted in the borrow tests.

         

        Best effort and Reserved Traffic

         

        These graphs mainly show that the reserved traffic is not affected from the best effort traffic. When no reserved session are present the best effort traffic use the whole bandwidth, but its bandwidth is reduced when reserved session starts.

        Anyway, these graphs confirm that the UDP session receives less service than the TCP: first two graphs (with one UDP and one TCP session) show clearly that the UDP session use less bandwidth than TCP. Third graph, with two TCP session, shows that the bandwidth is equally shared among the two sessions.

      3. Conclusions

      The integration of the CBQ and RSVP is very good. Unfortunately some problems arise, but these are not due to the RSVP but to the CBQ mechanism, especially to the borrow implementation. A provider that would like to adopt RSVP with CBQ must take care that the current limitation of the CBQ does not affect its performance.

    3. CBQ Performance

The CBQ mechanism consists of a classifier, an estimator, and a packet scheduler. Regarding to the performance issue, we can say:

 

It has no sense to understand the CBQ latency because usually the time spent from one packet waiting in the output buffer is much bigger than the time spent in the CBQ mechanism. Even if the CBQ latency increases due to the increased number of classes, the bigger part of the time that a packet spends in a router is due to its permanency in the output buffer. So, if the goal is to limit the end to end delay, limiting the buffer length is more effective than decreasing the scheduling overhead.

However the maximum performance in terms of packet per second is an interesting test. Anyway is easy to derive, from the packet per second measurement, the latency of each packet in the router machine.

 

      1. CBQ Throughput (perf)

Is not easy no determine the CBQ throughput, for a lot of different reasons.

First of all, sending TCP traffic would be better because it doesn’t waste CPU cycles. However it’s no trivial to impose a TCP packet size: the only mean to impose a certain packet size is to set the socket buffer (TTCP –b command) to a certain value: however this is very overloading for the CPU, because with small packet size the amount of the CPU power needed is very high (each packet sent requires an acknowledge), and moreover the path latency can affect the overall throughput.

For these reason, it has been chosen the UDP flow, but with some warning. The UDP flows does not adapt itself to the network load: so, to avoid the router CPU overloading, the link between the sender and the router must be set to the appropriate speed. Generally are done two kinds of test: one where this link is set a little larger than the number of packets that the router can manage (this means that the router has to discard some UDP packet), and one where it is set a little smaller than the router capacity: in this case the number of packet in is equal to the number of packet out from the router. Anyway the use of an UDP flow means that only the flow from the source to the destination is present in the system, with no flows from the destination to the source, otherwise than TCP flows.

To measure the system performance it has been used the XPERFMON++ (with the options +systemcpu +inputpkts +outputpkts), from the BSD port collection. However the results are not quite accurate, because:

 

Therefore these performance measurements have not an high lever of precision, and must be seen as an indication of the CBQ performance. The most affordable result is in the last column: obviously the last two column should report approximately the same value (one is the number of outgoing packets from Kirki, and the other is the number of incoming packets to Ammon): this is not true for the measurement problems reported above. The more exact value is that reported in the last column, because the CPU load in Ammon is a bit lower than that in Kirki, so the result is a bit more affordable.

The CBQ configuration for this test is very simple: one class (the default class) with the 100% bandwidth. The bandwidth of the root class is the same as the Kirki – Ammon PVC. No packet size is set, because other tests showed that the packetsize parameter has no effect here.

 

Results

Clearly the results depend on the packet size. However it’s possible to say that the throughput with small packet size is approximately 15Kpps on an AMD K6-200 machine.

When the packet size increases, the overall performance decreases, and this is easy to understand because of the higher load to transfer the packets from the network interface to the memory and vice versa.

This result is coherent with those of the ALTQ guy: in his paper (table 1) we can deduce that with 1500MTU it has approximately 15Kpps, obtained with a more powerful machine than our.

It is important to remember that when the router machine is running at full speed no other processes receive service: this machine appears like a blocked PC.

 

UDP Payload size (byte)

PVC Juliet – Kirki (Mpbs)

PVC Kirki – Ammon

Packet In (kirki)

Packet Out (kirki)

Packet in (Ammon)

4

15 Mbps

30 Mbps

14900

14900

14900 (1)

4

15.2 Mbps

30 Mbps

14750

14700

15000

4

16 Mbps

30 Mbps

15800

14400

14700

32

15 Mbps

30 Mbps

14950

14950

14950 (1)

32

16 Mbps

30 Mbps

15900

14200

14700

32

20 Mbps

30 Mbps

19500

12200

12600

32

25 Mbps

30 Mbps

24100

10300

10500

64

22 Mbps

40 Mbps

14300

13500

13800

64

21.4 Mbps

40 Mbps

14200

14200

14200 (1)

128

40 Mbps

40 Mbps

19900

10600

10600

128

32 Mbps

40 Mbps

16000

13300

13400

128

29 Mbps

40 Mbps

14500

14300

14400

(1) test where the CPU load is not 100%

      1. CBQ classes overload
      2. This test use the same configuration of the above test. Goal is to understand the worsening in the CBQ mechanism when a lot of classes are involved in the system. This test use a 60Mbps PVC, with configuration with 10-20-50 and 100 classes. Each class use 0% bandwidth, except the default class that use the 100%.

        Each class differ from the others for the destination port: this is a very simple filter but quite heavy to compute because it needs that every packet must be analysed quite in depth.

         

        Results

        Results show clearly a worsening when the number of classes increases. This can be a problem not in the normal use of CBQ stand alone (since the number of classes is limited because is not possible to allocate a decimal percentage of the root bandwidth), but in an RSVP environment because of its ability to create dynamically a lot of classes.

         

        UDP Payload size (byte)

        Number of classes

        PVC Juliet – Kirki (Mpbs)

        PVC Kirki – Ammon

        Packet In (kirki)

        Packet Out (kirki)

        Packet in (Ammon)

        32

        10

        15 Mbps

        60 Mbps

        14950

        14950

        14950 (1)

        32

        20

        15 Mbps

        60 Mbps

        14950

        14950

        14950 (1)

        32

        50

        15 Mbps

        60 Mbps

        14400

        13400

        13950

        32

        50

        14.2 Mbps

        60 Mbps

        13600

        13500

        14100

        32

        50

        14 Mbps

        60 Mbps

        13900

        13800

        14100

        32

        50

        13.7 Mbps

        60 Mbps

        14100

        14100

        14100 (1)

        32

        100

        13.5 Mbps

        100 Mbps

        13200

        11500

        11900

        32

        100

        12.8 Mbps

        100 Mbps

        12700

        11900

        12200

        32

        100

        12.2 Mbps

        100 Mbps

        11900

        11900

        12200 (1)

        (1) test where the CPU load is not 100%

         

      3. CBQ Memory load

It’s important to understand the memory requirement of the CBQ too. Unfortunately it is not clear (to me) how to measure the memory occupancy. (NOTE: I’m not sure about how to measure the memory load)

The configuration files and the machines configuration are the same of the above tests.

 

Results

The result, derived from the TOP and the PS program (and they are coherent among themselves) show that the memory occupancy is very low. This is not clear to me (NOTE: question mark about the memory result) because it seems to me that each class in CBQ allocates a 30 elements queue (report in CBQSTAT program). This clearly doesn’t matches with the results on the following table.

 

The PS UNIX command shows:

Number of classes

VSZ (Virtual Size Memory) (KB)

RSS (Real Memory) (KB)

10

264

600

20

268

600

50

280

608

100

300

608

 

Equivalent results can be obtained from the TOP program, ad monitoring the Active Memory. This parameter increases of approximately the same amount of the sum of the VSZ and RSS reported by the PS. This amount of memory is fixed: in other word the presence of flows to manage doesn’t affect the memory allocation.

    1. Conclusions

The main problems of the CBQ package seems to be:

 

Note: Test with

NOTE

When the NOTE word is found somewhere in this report, it means that something can be wrong, if can be something to do, or I’m not sure of that I’ve written …