Program slicing isolates the various sub-computations which make up program code. In the last part of his series, Mark Harman shows how the overlap of these threads can be used to measure program cohesion.
Ever since early programmers were counting how many lines of code they had written to work out how many punched cards theyd need, the idea of measuring code has been applied to understand and predict aspects of its production. Program measurements have come to be known as code metrics, and their application has spread from the final code, back through design and specification stages of the life cycle, and on to encompass the complete life cycle itself in the areas of process improvement measurement. The clarion call of their advocates has its root in a public address made by Lord Kelvin in 1889, which has become loosely paraphrased in the aphorism: How can we control what we cannot measure?
The application of metrics to controlling software development has had a rather chequered life history. The misuse of metrics as a predication of the quality of developers has tended to give the whole field a bad press among the development community. However, there are several very valid applications of cohesion metrics, as discussed at the end of this article. Before coming to that, well see how the cohesiveness of programs can be measured via the technique of program slicing (discussed in my October and November 1996 EXE articles).
A cohesive program is one in which the modularisation of functionality is performed correctly'. More precisely, a cohesive module or function should perform only closely related tasks. A function that divides two numbers and returns the result and remainder would be categorised as highly cohesive, whereas a function that returns the largest of two numbers together with their product would be less so. The principle is mirrored in object-oriented programming by the concept of encapsulation: well-encapsulated objects contain all necessary data and function members within themselves.
The motivation for measuring and assessing the cohesiveness of programs rests upon observations and claims that highly cohesive code is easier to maintain, modify and reuse. Cohesion was a product of the effort to define principles for programming, to turn the activity from a craft into an engineering discipline.
However, this concept of cohesion is clearly rather subjective: how do we decide which tasks are related? In this article I shall look at how the quantity of cohesion possessed by a function can be gauged via the technique of program slicing.
Levels of cohesion
In their highly influential book Structured Design, Constantine and Yourdon identify seven levels of cohesion, defined in terms of processing elements. In ascending order of cohesiveness, they are:
1. Coincidental: the lowest level functions exhibit no cohesion other than the coincidental inclusion of several tasks.
2. Logical: at each invocation of the function one of the processing elements is executed.
3. Temporal: the processing elements are all executed within some limited time frame.
4. Procedural: the processing elements are all the elements of some construct.
5. Communicational: the processing elements either share common input data or produce common output data.
6. Sequential: the output of one processing element is provided as the input to another.
7. Functional: all processing elements implement a single specific function.
Of these levels, the most desirable is functional cohesion. However, even this definition is rather open to interpretation. Fortunately, we can take advantage of program slicing to associate functions with a number representing the amount of functional cohesion they possess.
Using slices to assess cohesion
Program slicing isolates the sections of code that affect the value of a chosen set of variables at some chosen point in a program, producing code fragments known as slices. The technique was explained in detail in the October 1996 issue.
In general, slices are constructed for a set of variables, but for simplicity well just consider slices constructed for a single variable. For a slicing criterion consisting of a variable v and a point of interest n, we shall say that the slice is constructed for v at n. Consider the simple program fragment in Listing 1. If we slice this program for the variable z at the end of the fragment, then we isolate (in the slice) those lines of the program which are involved in the overall computation of the final value of z. This slice is depicted in Listing 2. Lines which have been removed are replaced by comments. These lines play no part in the computation of the final value of z.
If this technique is used repeatedly to capture the threads of computation associated with each of a programs variables, we can examine the overlap of the resulting slices to get a crude measure of the programs level of cohesion. Think of it this way: a slice captures a specific thread of a program concerned with the computation of some variable. If we took several slices from a function, each for a different variable, and we found that these slices had a lot of code in common, then we would be justified in thinking that the variables were related in some way. We might go further, and decide that the function's `tasks' are captured by the computation it performs on these variables, and therefore we would conclude that the function's tasks were strongly related and thus that the function was highly cohesive.
The first step in developing a formal measure of cohesion based upon this idea is to pin down what the processing elements of a function are. We shall take the view that a processing element is a piece of code which calculates a value. We further specify that this value must be visible outside the function in order to constitute a result. Such values are those printed by the function and those stored in global variables or reference parameters.
Each of these forms of output can be thought of as the calculation of a value for a variable. We shall therefore focus our attention on the values printed out by a function. To simplify matters further, we shall assume that the values are printed out at the end of the function, and that each output consists of printing the value of one of the function's local variables. Relaxing these simplifying assumptions does make the calculation of cohesion slightly more involved, but does not render the approach we shall describe inapplicable.
We can isolate processing elements of this type by slicing the program for each particular output variable. Consider, for example, the function Marks in Listing 3. It outputs the value of three variables and therefore has three processing elements, one for each variable. These can be isolated by slicing, as shown in Listings 4, 5 and 6, which show the slices associated with Count, Pass and Fail respectively. The three processing elements have very little overlap, since the calculation of passes, fails and the overall count are largely independent of one another. The cohesive section of the function is made up of the statements shared between all of its processing elements: in this case, just two lines. This can be visualised by placing markers by each line of the program, indicating which slices it belongs to, as in Listing 7. We can calculate a simple measure of the functions cohesion as proportion of statements in the function that are more cohesive (shared between all the processing elements): in this case, 2/10 or 0.2.
For the function MinMax in Listing 8, we would construct two slices associated with the two processing elements. As you can see from the labelling, this function is measured as more cohesive than Marks, with six of its ten lines shared between both processing elements, giving a metric of 0.6.
The misuse of measurement
This measure of cohesion, based only upon the number of statements in a functions cohesive section relative to the size of the function as a whole is rather crude. Well-written functions tend to comprise few statements, and thus the result can be affected greatly by the inclusion or absence of a single statement in a slice. In addition, no differentiation is made between the relative importance of the various statements in a function. For example, an initialisation statement in all slices should obviously be regarded as of less importance than an assignment which stores the result of a complex expression in a variable.
The measurement also suffers from being fundamentally syntactic, rather than semantic. That is, the cohesion reading we obtain does not (and cannot) tell us anything about the intent of the function. The function in Listing 3 performs three tasks which are all strongly semantically related, but this is not taken into account in our cohesion rating. Our measure might be better termed syntactic cohesion.
As a result, the cohesion measure would be most inappropriate as a tool of management. One could imagine the manager of some software development project deciding that, as low cohesion is considered harmful, all functions which fail to rise above some threshold reading (0.5 has a ring to it) will be rejected. This arbitrary diktat is typical of the kind of misuse of metrics which has rightly given the field a bad name among developers, and will not actually provide the manager with any meaningful information. Any program can be trivially modified to fool the measurement tool into providing a suitably high cohesion value. For example, consider the version of the Marks function in listing 9. We have added three lines of code which will have absolutely no effect upon the semantics of the program, but make it appear that the computation of each variable depends upon the computation of the other two. As a result, the cohesion measurement tool will include all the preceding computation for the three variables in each of their slices. (The slices and cohesive section are depicted by labels in Listing 9.) This simple syntactic mischief has changed the program's cohesion reading from 0.2 to 0.77.
This trick could be performed on any function, and our manager could find himself with a set of code with the same quality as before, magically rendered cohesive, maintainable, reliable and reusable by the mere presence of some rather odd assignment statements. This unfortunate misapplication of measurement is not new: in 1784 William Pitt imposed the window tax, the idea being that the value of a property was related to the number of windows (above 6) it had. The legacy of this tax can still be seen today in the shape of beautiful Georgian houses deListed by their hastily bricked-up windows.
Measurement and predication systems
The reader will be forgiven for assuming that the measurement of cohesion is best avoided. However, this is not true. It does have an important role to play, but not as a predicator of the quality of a program nor of the quality of the programmer who produced it.
It is a well known fact that faults in individual functions often have a knock-on effect, causing a whole raft of other problems which disappear when the fault is corrected. Programs measurable as syntactically cohesive are likely to be especially prone to this, since they take such care to reuse sections of code and share computation of intermediate results. A high cohesion rating can therefore serve as a warning to watch out for these sorts of bugs. In addition, slicing analysis similar to that performed when measuring cohesion can prove helpful in tracking the code likely to be affected by such errors (see A piece of cake, EXE October 1996).
The discussion reveals the need to distinguish between the calculation embodied in a metric and the application to which the results are put. The calculation is well defined, being comprised of a working algorithm. What the measurements predict about the software is less well defined and requires some careful analysis.
Efforts to relate the outcome of any measurement to the quality of software is, in my view, doomed; there will always be a way of fooling measurement tools into predicting high quality. At a time when the software engineering community is becoming increasingly concerned with software architecture' it would be the height of folly to impose the crude software equivalent of the Window Tax, which would be answered by the software equivalent of bricked up windows. Instead, software measurement should be a tool for developers, providing the same form of basic measurement information that is routinely collected by engineers in other disciplines.
Mark Harman is director of research at the School of Informatics University of North London. He teaches programming in C++ and formal methods in Z and leads Project Project', a research group concerned with the development and application of program slicing technology (http://www.unl.ac.uk/~mark/projproj.html). He can be contacted via email at firstname.lastname@example.org. Dr Harmans at http://www.unl.ac.uk/~mark/welcome.html contains some of his team's other publications on slicing and pointers to the work in this area.