T E C H N I Q U E S

Carving up bugs

Slicing is an automated technique for program simplification, which can aid debugging and maintenance. Mark Harman explains how execution information can focus the spotlight even closer…

Slicing algorithms enable programmers to extract the parts of a program that affect a chosen set of variables at some chosen point, forming a slice. Slices can be very useful in understanding and debugging code, since they comprise only those parts of a program related to a specific sub-computation (defined by the slicing criterion), while retaining the effect of the original code. In general, slices are constructed for a set of variables, but in this article we shall simply consider slices constructed for a single variable. I’ll describe several ways in which dynamic information about the execution pattern of a program can be used to produce simpler slices than are possible using the static approach (described in detail in 'A piece of cake').

Static slicing

In what has come to be termed ‘static’ slicing, the slicing criterion contains no information about the execution of the program, and so the slice preserves the effect of the code for every possible execution pattern.

Consider the example in Listing 1. The program is supposed to calculate the sum and product of the sequence of numbers from 1 to the input number n, but the value of the product p is always found to be zero. In order to locate the cause of this errant behaviour, we can construct a static slice for the variable p at the end of the program, as shown in Listing 2. Since the slice is simpler than the original program, it is easier to locate the bug (in this case, p should be initialised to 1, instead of 0).

Although static slicing can undoubtedly assist in the debugging effort, it does tend to produce rather large slices. This is particularly true for well-constructed programs, which are typically highly cohesive, resulting in code where the computation of each variable is highly dependent upon the values of many other variables. Indeed, this observation about the relationship between the size of a typical slice and the cohesiveness of a program has led to work on cohesion metrics based upon slicing, which I will cover in a forthcoming article.

Listing 1: A program fragment to be statically sliced
scanf("%d",&n);
s=0;
p=0;
while (n>1)
{
s=s+n;
p=p*n;
n=n-1;
}
printf("%d",p); }
/* the slice point is the end of the program */

Dynamic slicing

When it comes to debugging a program, we will usually have executed it at least once, and found it to produce unexpected results. For example, the program in Listing 1 might have been passed an input value 0 for the variable n – this is in fact a highly likely choice, as one tends to test loops with ‘special case’ values. Now, when we execute the program, we will find that the variable p contains the wrong value.

Instead of consulting a static slice to locate the bug, it would make more sense to construct a slice which exploits the specific information available about the input which caused the code to break. Such a slice is called a dynamic slice, since it is constructed with respect to the traditional slicing criterion together with information on a particular execution pattern, to wit: the sequence of input values passed to the program. Collectively, this information is referred to as the ‘dynamic slicing criterion’. We shall say that we construct a dynamic slice for a variable v, at a point p, on an input i. To describe the input sequence i, we shall enclose the sequence of values in angled brackets.

Listing 2: A static slice of Listing 1
scanf("%d",&n);
p=0;
while (n>1)
{
p=p*n;
n=n-1;
}

A dynamic slice of the code in Listing 1 constructed for p at the end of the program on the input sequence <0> is shown in Listing 3. It is far simpler than the corresponding static slice (Listing 2), and clearly highlights the bug in the original program!

Obviously, we were rather lucky in this case, achieving the maximum level of simplification possible. In practice the slices constructed will be more complex. However, in cases where a variety of test data causes some error, there is nothing to stop us having the slicing tool construct a dynamic slice for each set of input, and present us with the smallest.

Listing 3: A dynamic slice of Listing 1
p=0;

Slicing precision

There are two main properties desirable in slicing algorithms, which we call soundness and completeness. In order to be sound an algorithm must never delete a statement from the original program which could have an effect upon the slicing criterion. It is this property that allows us to analyse a slice in total confidence that it contains all the statements relevant to the criterion. Soundness on its own, however, is not enough: a trivial slicing algorithm could refuse to delete any statements from a program, thereby achieving soundness by doing nothing.

In order to be complete a slicing algorithm must remove all statements which cannot effect the slicing criterion. In general, completeness is unachievable (for theoretical reasons to do with undecidability). Therefore, the goal of slicing algorithms is to delete as many statements as possible, without giving up soundness. The closer an algorithm approximates completeness, the more precise the slices it constructs will be. It has been proven that a dynamic slice constructed for some variables at a certain point will always be at least as thin as the corresponding static slice, and in practice we find that dynamic slices are often a great deal simpler. This makes the technique ideally suited to debugging, when we want to focus on the statements that could have caused errors in a particular execution.

Constructing dynamic slices

One way to construct a sound dynamic slice is simply to trace the execution of the program for the input we are interested in and then to remove from the corresponding static slice those statements which were not actually executed. This is a nice, simple algorithm and it is obviously sound, because an unexecuted statement cannot affect the slicing criterion. However, this simplistic approach does not construct the most precise dynamic slices. To see why, consider the example program in Listing 4. This is not meant to be a realistic program: rather it serves to illustrate the subtlety of dynamic program dependencies.

Listing 4: The subtlety of dynamic information
for(i=1;i<=5;i++)
{
scanf("%d",&x) ;
if (x>0)
y = x+1;
else
y = x+2;
}

Suppose the input supplied to the program is <2,-2,4,-4,6> and that we are interested in the effect the program has upon the variable y. Notice that the static slice for this program, constructed for the variable y at the last line, is the entire program – no lines may be deleted. Also notice that for this input sequence, all the lines of the original program are executed, so our simplistic algorithm would generate a dynamic slice consisting of the whole program.

However, a more precise slice that can be constructed from this criterion, shown in Listing 5. We do not need to include the line y=x+2; because this line does not contribute to the final value of y when the last input entered is greater than zero. In order to construct slices like this, we need to perform more sophisticated analysis of the way in which dependencies between parts of a program change as the execution progresses.

Listing 5: A more precise dynamic slice of Listing 4
for(i=1;i<=5;i++)
{
scanf("%d",&x) ;
if (x>0)
y = x+1;
}

Is dynamic slicing always better?

We have seen that dynamic slicing is very attractive as an aid to debugging, and indeed superior to static slicing for that purpose, and the reader would be forgiven for concluding that static slicing is thus obsolete. However, we do still require static slicing for situations where slices have to be sound for every possible execution pattern.

For example, suppose we are interested in reusing the part of a program which implements a particularly efficient and well-tested approach to some problem. Often, in such situations (particularly with legacy code), the code we want to reuse will be intermingled with all sorts of other unrelated code. In this situation static slicing is ideal as a technique for extracting the part of the program we require, while leaving behind the code we are not interested in.

The static and dynamic paradigms represent two extremes — either we say nothing about the execution characteristics of the program (and cater for every possible execution pattern) or we say everything (and cater only for a single input). Recent developments have produced a variety of compromise approaches to slicing, which occupy the space in between.

These approaches are called ‘quasi-static’ – although they might equally be called ‘quasi-dynamic’, as they lie somewhere between the two poles. The aim of these paradigms is to enable programmers to specify some information about a program’s execution, without having to provide all of it. They are most applicable to the area of program comprehension.

Partial input information

When trying to understand the behaviour of a program, we may sometimes not want to specify the entire input sequence. Often, we can significantly improve the simplification power of the slicing tool by providing just the first part of an input sequence. For example, consider the program fragment in Listing 6. If we know that the first element of the input sequence is 4 then, without any further information, we know that the assignment p = y*y; will not be executed, and can thus be excluded from any slice.

By specifying only a prefix of the input, we move away from dynamic slicing towards static slicing: a whole set of execution patterns is captured by the slice, rather than just a single execution. Knowing for instance, that the statement p = y*y; is not executed if the input starts with 4 captures the information contained in a set of dynamic slices, for which the input sequence takes the form <4,x> for some x.

Listing 6: Input prefix example
scanf("%d",&x);
scanf("%d",&y);
if (x>=0)
p = x*y;
else
p = y*y;
printf("%d",p);

Conditioned slicing

There are often situations where we shall want still more flexibility in specifying the conditions in which the program is executed. For example, consider the program fragment in Listing 7. In order to achieve any extra simplification from this program, we shall have to provide information about both inputs to the program, so specifying only part of the input will not help.

Listing 7: Conditioned slice example
scanf("%d",&x);
scanf("%d",&y);
if(x>y)
z = 1;
else
z = 2;
printf("%d",z);

Fortunately, we can provide information to the slicing tool about the input without being so specific as to give the precise values. We can use a boolean expression, for example x==y+4, to relate the possible values of the two inputs x and y. When the program is executed in a state which satisfies this condition, we know that the assignment z = 2; will not be executed. Any slice constructed with respect to this condition may therefore omit that statement. This approach to slicing is called the conditioned approach, because the slice is conditioned by knowledge about the situation in which the program is to be executed. Furthermore, we could use a similar approach for code like that in Listing 6: instead of providing an initial prefix of 4, we could specify x==4. It is in fact possible to replicate anything that could be achieved with the partial input approach using conditionals.

Conditioned slicing addresses just the kind of problems maintainers face when presented with the task of understanding large legacy systems. Often, in this situation, we find ourselves asking questions like ‘suppose we know that x is greater than y and that z is equal to 4, then which statements would affect v at line 38?’ Conditioned slicing can provide an answer automatically: you simply construct a slice for v, at 38, on x>y && z==4. By building up a collage of conditioned slices which isolate different aspects of the program's behaviour, you can quickly obtain a picture of how the program behaves under various conditions. The approach is really nothing more than a tool-assisted form of divide and conquer.

Program slices can be constructed in numerous ways, with entirely static information or either partial or full information about program execution. There are trade-offs between the precision of a slice and the time needed to compute it, and between how specific we are about possible execution patterns and how simple our slices will be. Fortunately, the recent developments in slicing will allow us greater freedom to choose how we want to balance the various considerations involved.

A free dynamic slicer
Sophisticated dynamic slicing of the type described in the article is provided by the freely available slicing tool Spyder, created by Hiralal Agrawal, Richard DeMillo, Hsin Pan, Eugene Spafford and Chonchanok Viravan at Purdue University. The team have (very kindly) made the tool available for downloading at http://www.cs.purdue.edu/homes/spaf/spyder.html.

Mark Harman is director of research at the School of Informatics University of North London, where he teaches programming in C++ and formal methods in Z. Dr. Harman, his colleagues and students have been working on slicing for the past three years, concentrating on new algorithms and approaches to slicing. He can be contacted via email at m.harman@unl.ac.uk. Dr. Harman's Web page at http://www.unl.ac.uk/~mark/welcome.html contains some of his team's other publications on slicing and pointers to the work in this area.