Mark Harman: MSc Projects 2008/9

Mark Harman,
Professor of Software Engineering,
Software Engineering Group,
Department of Computer Science,
King's College London,
Strand, London, WC2R 2LS.

Phone: +44 (0)20 7848 2895
FAX: +44 (0)20 7848 2851
e-mail: my first name followed by the at symbol then dcs.kcl.ac.uk
web: http://www.dcs.kcl.ac.uk/sta ff/mark/.

Mark Harman's MSc Project Topics for 2008/9

This website gives you an introduction to the projects I am offering to supervise this coming year. It also explains the way in which I supervise and the kind of projects I supervise, so that you will know a little bit about what it would be like to undertake a project under my supervision.
You can find a list of projects I have previously supervised here.

If you are interested in one them, have a look over the articles I cite (if there are any cited) and then email me to make an appointment to discuss it further. Just so that you know: the first thing I am likely to ask you when we meet will be "what is the project about?" and the second thing I will be likely to ask is that you describe how you would approach the project.

The kind of student who does these projects

The projects I supervise have three characteristics:

They tend to involve software engineering problems with a real world application in the software development industry
They involve writing and developing programs
They involve experimenting with the programs to evaluate and improve them

These projects are therefore of interest to students who like to program and who like to experiment and those who want to see a link to the software development industry in their work. In the past students of mine who have done particularly well, have been able to get companies involved with their projects through my contacts with the software development industry.
Also, some of the best of the previous years' projects have been or are also in the process of being written up for publication. In supervising projects, it is always my ultimate goal to steer the student towards work which could ultimately lead to publication, should the work be of sufficiently high quality and originality.
All of the projects involve writing programs in one form or another. I will not impose a programming language upon you; you can write the programs in any language you prefer. However, you will find the projects either uninteresting or too demanding (or both) if you do not like to write programs.
The projects also involve experimentation. Once you have written a simple initial program to attack the problem in hand, you will be need to try out the system you have created on examples. Initially these examples will be simple "proof of concept" examples that you will create yourself. However, as the project and the system you are creating evolve, you will try to experiment with your ideas by applying them to larger and more realistic situations.
Ultimately, the mark of a truly outstanding piece of work is a thorough, realistic and convincing evaluation. If the work is novel and well-evaluated then it may well also be publishable. Work that is publishable is also work that should attract high marks.
All of the projects I list here could be aim to have some bearing on the kinds of problems faced by the software industry. However, they also have a research component to them, so that students who are interested in pursuing a research-led career in the general field of software engineering, will also find that the projects form an excellent foundation for this too. Several previous MSc students who have performed exceptionally well have gone on to undertake PhD programmes under my supervision.
How I supervise

The group of students I am supervising will meet regularly and you will make a short presentation to the group, describing the progress you are making and giving examples of interesting issues that have arisen. These will be discussed by the group. This is an excellent way to learn from your colleagues and to see how you are progressing relative to other students. It is also an important chance to develop your skills at giving presentations, which you will find tremendously useful in your future careers.
Companies that have employed students I have supervised have reported that they are impressed with their presentation skills. Companies increasingly hold these skills in very high regard and so you should view the chance to practice and develop your ability to deliver formal presentations as a worthwhile spin off benefit of project supervision. After taking a project under my supervision I am confident that you will feel much more comfortable with formal presentation. This may be a significant advantage in your future career.
As well as the group meetings, students supervised by me can also see me in my surgery slot individually to discuss, in more detail, some interesting and/or important aspect of their work. Where this affords insufficient time, can make special appointments to see me.
I expect students to work hard, to be motivated and to be prepared to help the others in the group by asking questions and taking and interest in their work. In return for this, you will find that the research and development environment in which you conduct your project will be stimulating, challenging and rewarding.
You may also be assigned one of my research staff or PhD students to work alongside, if your work related to theirs. This will provide you will an additional source of advice and guidance. However, I make it clear from the outset that this is complementary to your supervision; in no way is it substitute.

Search Based Software Engineering

The department of computer science's CREST. centre is a recognized world-leading centre for research on the rapidly growing area of Search Based optimization of Software Engineering (known as SBSE). This project is a chance to work along side the CREST centre's researchers in the development of optimization technologies for software engineering activities. Several MSc project students are sought to work on different software engineering applications of SBSE.
Search techniques like genetic algorithms and simulated annealing have been applied in many engineering contexts to search, for example, for good engine designs. The approach uses ideas from nature (such as Darwinian evolution) to guide the search for solutions which are fitter for purpose than others.
This project will be concerned with applying these search techniques to an aspect of software engineering. The student is free to choose the area of software engineering that most appeals to them. Three possible areas are requirements engineering, regression testing and project planning, but I am more than happy to talk to students who are interested in other applications of search in software engineering. A paper which sets out a "manifesto" for search based software engineering can be found here:
Mark Harman, Bryan Jones.
Search Based Software Engineering
Journal of Information and Software Technology, 43(14):833-839, 2001.
A more recent paper the describes my view of the way in which this field is heading was given at the International Conference on Software Engineering:
Mark Harman
The Current State and Future of Search Based Software Engineering
29th Int. Conference on Software Engineering (ICSE 2007), Future of Software Engineering (FoSE)
Minneapolis, USA, 20 - 26 May 2007

I recently gave a keynote talk at the 15th International Conference on Program Comprehension in which I set out a view of how SBSE techniques could be applied to program comprehension, outlining ways in which we could use SBSE techniques to develop cognitive models of software engineering activities that rely on comprehension. The paper that accompanied this talk gives some examples and pointers:
Mark Harman
Search Based Software Engineering for Program Comprehension
Invited paper
15th International Conference on Program Comprehension (ICPC 2007).
Banff, Canada, 26 - 29 June 2007
To appear.

Here are some other references to papers that give examples of applications of Search Based Software Engineering (the first of these is a publication that came from a previous year's MSc project and appeared at a top international conference):
Anthony Finkelstein, Mark Harman, Afshin Mansouri , Jian Ren and Yuanyuan Zhang.
"Fairness Analysis" in Requirements Assignments
16th International Requirements Engineering Conference (RE'08)
Barcelona, Spain, 8th-12th September 2008.
To appear.

Zheng Li, Mark Harman and Rob Hierons.
Search Algorithms for Regression Test Case Prioritisation
IEEE Transactions on Software Engineering.
33(4): 225-237, 2007.

Shin Yoo and Mark Harman.
Pareto Efficient Multi-Objective Test Case Selection
ACM International Symposium on Software Testing and Analysis (ISSTA 2007).
London, England, 9 - 12 July 2007
To appear.

Mark Harman and Laurie Tratt.
Pareto Optimal Search Based Refactoring at the Design Level
ACM Genetic and Evolutionary Computation COnference (GECCO 2007).
London, England, 7 - 11 July 2007
To appear.

Mark Harman, Kiran Lakhotia and Phil McMinn.
A Multi-Objective Approach to Search-Based Test Data Generation
ACM Genetic and Evolutionary Computation COnference (GECCO 2007).
London, England, 7 - 11 July 2007
To appear.

YuanYuan Zhang Mark Harman, Afshin Mansouri
The Multi-Objective Next Release Problem
"Best at GECCO" award winning paper for SBSE.
ACM Genetic and Evolutionary Computation COnference (GECCO 2007).
London, England, 7 - 11 July 2007
To appear.

Higher Order Mutation Testing

Mutation testing is a method for inserting faults into software in order to assess our ability to find these faults. Until very recently, the kinds of faults that have been used are relatively simple, consisting of a simple change ot one line of code. My group has been experimenting with a new technique: higher order mutation testing, which can generate more subtle faults. My PhD student Yue Jia has implemented a tool for higher order mutation testing of C programs. There are many possibilities for this development of mutation testing and I would be interested in discussing possible projects that explore the power and potential of Higher Order Mutation Testing.
A paper that explains the approach can be found here:

Yue Jia and Mark Harman.
Constructing Subtle Faults Using Higher Order Mutation Testing
8th International Working Conference on Source Code Analysis and Manipulation (SCAM'08)
Beijing, China, 28th-29th September 2008.
To appear.

Test data generation with two objectives: high branch coverage and few test cases

Current work on automated software test data generation has tended to focus on covering branches. The approach has had the single objective of covering as many branches as possible. This can lead to solutions that have one test case per branch. A more sophisticated approach would consist of aiming to generate a set of test cases that cover all the branches but which does so with the fewest possible test cases. This problem can be formulated as a two objective problem in which we seek to maximise branch coverage, while minimizing the number of test cases. There are other possible two objective formulations, for example, we may seek to maximize branch coverage while minimizing computation time, or while minimizing the effort required to check the test case. These multi objective formulations of test data generation represent a new and important development in the theory and practice of software test data generation.
An initial paper on the possibility of using a multi objective approach to test data generation appeared this year. It considered the two objectives of branch coverage and memory consumption. This work will address the problem by considering a different set of objectives. related papers are:

Mark Harman, Kiran Lakhotia and Phil McMinn.
A Multi-Objective Approach to Search-Based Test Data Generation
ACM Genetic and Evolutionary Computation COnference (GECCO 2007).
London, England, 7 - 11 July 2007
To appear.

Shin Yoo and Mark Harman.
Pareto Efficient Multi-Objective Test Case Selection
ACM International Symposium on Software Testing and Analysis (ISSTA 2007).
London, England, 9 - 12 July 2007
To appear.

Genetic Programming to evolve Strategies for Software Engineering

Existing work on search based techniques for software engineering has tended to evolve instances of solutions to problems. For example we evolve a good test set to exercise the software. Genetic programming would allow us to evolve a strategy rather than a single instance. This may provide a breakthrough by revealing insight into the nature of the problem. For instance, rather than evolving a set of test data to cover branches, we can evolve a strategy for covering additional branches based on a test case that covers a given branch. The strategy is a simple genetically evolved program that describes how to manipulate the existing test case into a new one.
We have a tool for generating test data, AUSTIN, developed by Kiran Lakhotia and a world-leading expert on Genetic programming Bill Langdon in the CREST centre, so there would be plenty of active researchers with whom the motivated student could discuss these ideas and their development of them.

Comment analysis

The comments in a program are a key aspect of the program often one that is overlooked. This project will investigate measures of program comments and techniques for analysing the comments in software. For example, an obvious measurement is the comment density, which could be measured in comment characters per line of code. This could be used as a measure for a whole program, or for parts of a program. It raises the question as to whether the comment density is higher in areas of the program that have been more often altered, and whether there is a correlation between comment density and fault density.

An alternative, possibly complimentary project might consider dependence analysis based on comments, as well as traditional measures of dependence. For example, where a program variable is mentioned in a comment, this variable might be considered to be a pseudo- referenced variable. An analysis of dependence can then be extended to consider the variables mentioned in comments. This would allow for measurement of how many comments refer to variables that are traditionally referenced and how many are mentioned only in comments. The project will have to cover interesting an important issues, such as how we determine the part of code to which a comment refers and what the connection is between the comment part of the program and the formal program code.
A staring point for this project would be the papers listed below. I am not an author of these, so I cannot make the full text available, However, the full text is available on the IEEE explore website, for which there is access from the King's College domain.
Reuse of modular software with automated comment analysis.

S. Matwin, S and A. Ahmad.
International Conference on Software Maintenance, 1994, ICSM 2004, Victoria, BC, Canada.
September 1994, pages 222-231.
ISBN: 0-8186-6330-8.

QDA - A Method for Systematic Informal Program Analysis
W.E. Howden and B. Wieand
IEEE Transactions on Software Engineering, June 1994 (Vol. 20, No. 6) pp. 445-462.

Software quality assessment using source code analysis

Many companies now buy software from third parties to assemble themselves. This outsourcing of software development means that there is a pressing need to find automated techniques to assess the quality of the software which is being bought in. These companies are looking for ways to use source code analysis techniques, like dependence analysis, to understand the quality of the software they are buying.
This project will consider ways in which slicing (a form of dependence analysis) can be used to understand the deep structure of the software to help inform companies about software quality and potential problem points. A paper which has some ideas along these lines can be found below. The paper cited below gives a flavour of the kind of analysis that can be performed using slicing. This project will involve inventing and experimenting with similar approaches and/or working out how we can derive numbers to qualify software quality or problem points, based on these ideas.
Two techniques that might be considered for visualisation of program dependence features are introduced in the papers listed below.

David Binkley and Mark Harman
Locating Dependence Clusters and Dependence Pollution
21st IEEE International Conference on Software Maintenance (ICSM 2005).
September 25th-30th 2005, Budapest, Hungary.
Pages 177-186.

David Binkley and Mark Harman
Analysis and Visualisation of Predicate Dependence on Formal Parameters and Global Variables.
IEEE Transactions on Software Engineering.
30(11): 715-735, 2004.

Optimization of Software Project Management

Software project management is a very difficult problem; some would say it is the hardest of all projects to manage. One of the pressing problems is the prevalence of estimates. That is, most software projects start out with very vague estimates. The manager must make their plans based on estimates that can be very wildly wrong. In this project, the student will investigate search based techniques to explore the sensitivity of the solutions that can be found to estimate error. This will allow the manager to focus on those estimate that can have the biggest effect.
The problem of sensitivity analysis was addressed by Shin Yoo in his 2006 MSc dissertation

Shin Yoo (winner of best MSc project 2006 prize)
The use of a novel semi-exhaustive search algorithm for the analysis of data sensitivity in a feature subset selection problem

However, Shin was concerned with estimate sensitivity for the requirements engineering and component selection problem. In this project the student will be considering the problem of sensitivity for the project planning problem. Papers relevant to project planning (using search based techniques) can be found below:

Giulio Antoniol, Massimiliano Di Penta, Mark Harman and Fahim Qureshi.
The effect of communication overhead on software maintenance project staffing: a Search-based approach
23rd IEEE International Conference on Software Maintenance (ICSM 2007).
2-5 October 2007, Paris, France.
To appear.

Giulio Antoniol, Massimiliano Di Penta and Mark Harman.
Search-Based Techniques Applied to Optimization of Project Planning for a Massive Maintenance Project
21st IEEE International Conference on Software Maintenance (ICSM 2005).
September 25th-30th 2005, Budapest, Hungary.
Pages 240-249.

Classical OR optimization for Software Engineering

Search based software engineering is a relatively well established discipline, with many researchers and practitioners developing techniques for optimizing software engineering activities, products and processes using meta-heuristic search techniques, However, comparatively little work has been undertaken on the classical OR techniques of linear programming, integer programming, branch and bound and other OR techniques. These techniques are often capable of providing precise, known optimal solutions in reasonable time. Therefore, it makes sense to re-visit areas of SBSE research and explore the extent to which classical OR techniques can be exploited to achieve better results.
This project is ideally suited to a student with a mathematical background, perhaps with an interest and/or previous experience with OR techniques in their first degree or in their working life. The best starting point is my recent overview paper on SBSE

Mark Harman
The Current State and Future of Search Based Software Engineering
29th Int. Conference on Software Engineering (ICSE 2007), Future of Software Engineering (FoSE)
Minneapolis, USA, 20 - 26 May 2007

This paper cites a few examples of where SBSE has used classic OR techniques. There are not many, so this is very much a green-field area with many interesting possibilities.

Testing Agent based Systems

There is a lot of work on software agents and agent oriented approaches to software engineering. There is also a great deal of work on software testing. However, there is very little work in the intersection of these two important areas of work. This project aims to take steps towards filling this gap in current understanding by developing an approach to testing agent oriented systems.
The project will define what it means to test an agent oriented system and consider the ways in which existing testing approaches, such as coverage based approaches, can be adapted to apply to agents. The project will then develop techniques for automated generation of test cases for agents.
You can find out more about Agents and Agent Oriented Software Engineering on Professor Luck's home page. This project focuses on testing agents, which is a relatively unexplored area.

Genetic Algorithms for Agent based Systems

Software agents are an interesting socially-inspired computational mechanism. Genetic algorithms are an optimization technique inspired by nature, in which a population of individuals compete and evolve. It seems natural to consider the ways in which these two approaches to computation might be combined. This project will explore the intersection of Agent Oriented Software Engineering and Genetic Algorithms. the focus will be to consider wy in which evolutionary inspired optimization approaches (like genetic algorithms) can ne used to optimizes the way in which agents develop and interact.
You can find out more about Agents and Agent Oriented Software Engineering on Professor Luck's home page.
There are few papers available that describe links between agent-oriented software engineering and genetic algorithms. One possible starting point might be the following paper, for which there is a web-available version:

Robert E. Smith and Nick Taylor
A Framework for Evolutionary Computation in Agent based Systems
Proceedings of the 1998 International Conference on Intelligent Systems, pp. 221-224

You can also consider the proceedings of the recent workshop: ecomass07

Software Self Testing

The automation of software testing suffers form the "oracle problem". That is: the problem of knowing whether the output from the program is correct. If we could automate the answer to this question then we would have a better program than the one we are trying to test and so there would be no need for testing. However, there are some behaviours of programs that are simply known to be wrong. For example, if the program raises an uncaught exception or crashes due to some exceptional behaviour such as array index out of range, null pointer de-reference or division by zero.
In this project we will pick on eno these "known faulty behaviours" and we will augment the program with code that attempts to deliberatively crash the program. This will be achieved by encasing the original program in a search based software test data generator that searches for inputs that make the program crash. The net result will be a technique for turning a program into an automated self-testing variant of itself. The self testing variant can be run indefinitely, during which time it will continue to test itself. This may find useful test cases that could cause the original program to fail.
This project may involve collaboration with Professor Clark at the University of York.