DEPARTMENT OF COMPUTER SCIENCE
Dr. Jens Krinke
UCL

Home
- Contact
- Teaching
- Research
- Publications

Student Projects

General Ideas

My areas of interest are Software Test, Program Analysis, Clone Detection, Bug Detection. If you are interested to do a project in one of the areas, please contact me.

Some project ideas are:

General Software Engineering

  • Finding and Exploiting Similarity between Software Tests
  • Identify reused code between mobile applications.
  • Identify reused code between Eclipse plugins.
  • Search for how code is copied: Where is my code coming from and where does it end up?
  • Detection of Changes in Public Interfaces.

Code Provenance

  • Finding and Exploiting Similarity between Software Tests
  • Identify reused code between mobile applications.
  • Identify reused code between Eclipse plugins.
  • Search for how code is copied: Where is my code coming from and where does it end up?
  • Multi-version code hashing and search.
  • Studying the Variability in Metrics for Cloned Code
  • Tracking Copied Code
  • Find irregular patterns in source code and natural text for plagiarism detection.

Software Testing and Fault Localisation

  • Finding and Exploiting Similarity between Software Tests
  • Cross-Referencing for Deletion Dependence
  • Automatic removal of unused code.
  • Producing Minimal Versions of Programs that Expose Faults
  • Automatic detection of non-determinism in programs.

Software Security

  • Risks of Data Based Taint Analysis
  • Information Flow Control and Taint Analysis via Program Slicing for Java
  • Mutation-Based Obfuscation
  • Obfuscation in Android Apps

A list of concrete topics you'll find below.

Specific Topics

Finding and Exploiting Similarity between Software Tests

With Unit Testing one has a link between the tested code and the code that tests it. This project will investigate if similar test code corresponds to the tested code to be similar, too (and vice versa). If such a correspondence exists, it may be possible to exploit the similarity to reuse test code for code that is not yet tested but has some similar code that is already tested.

Cross-Referencing for Deletion Dependence

In the ORBS Project, programs are manipulated by deleting statements and source code lines as much as possible. Very often, a deletion is attempted, but the resulting source code cannot even be compiled. This project will build a referencing tool that will extract links between lines from the analysed source code. These links will establish that a line x can only be deleted when a line y already has been deleted. The extracted links will be used to improve the ongoing ORBS research project.

Identify Reused Code between Mobile Applications

Mobile applications have to be developed quickly and therefore are likely to reuse proven code from similar applications. This project will analyse mobile applications (e.g. Android apps) for reused code. It will employ tools and techniques from Clone Detection and Software Bertillonage to identify similar and identical parts to gather data about reuse between apps. It will then study patterns of reuse between apps and will also look at reuse from and to non-mobile projects.

Automatic Removal of Unused Code

During Testing of a system, coverage of the code is often measured to find out which code has been executed. If a test fails, there are often tools at hand that help with finding problems inside the system, however, most of them are static analysis tools that analyse the complete source code. This project will develop a tool that automatically removes source code that is not executed during a test. An approach called delta debugging can be used for it: It removes parts of the source code and checks if the tests executed still produce the same output as before. Delta debugging will find the minimal source code that will still produce the same test results. The reduced source code can than be used for much more effective fault localisation or other static analyses.

Producing Minimal Versions of Programs that Expose Faults

Understanding what actually is responsible for causing a failure is a hard task. This project will remove as much as possible from a program that causes a failure so that remove anything else will either cause a different failure or change the behaviour with respect to the failure.

The project will build on previous research on Observation-Based Slicing

Tracking Copied Code

Clone detection tools usually report identified clones in sets of source code locations independent of its author and/or owner. This project will will use an available clone detection tool to find cloned code and then use the information available in a version archive to find the authors of the clones. For a specific author, a tool to be developed will then track from where he has copied code or where his code appears, too.

Studying the Variability in Metrics for Cloned Code

It is assumed that there are differences in properties for cloned and for non-cloned code and therefore metrics for cloned and non-cloned code have been defined, computed, and compared. However, what actually is cloned code is defined by the tool used to identify cloned code. However, not only the tool used impacts the distinction between cloned and non-cloned code, but also its configuration.

This project will study the variability in measured metrics for cloned and non-cloned code by using different tools and different configurations. In addition, it will use search-based techniques to identify the maximum range of variability.

As search-based techniques are used, this project will need to use a cloud-based approach.

Studying the Impact of Obfuscation on Source Code Plagiarism Detection

As can be seen in the Oracle vs. Google case, software plagiarism is now longer just a problem in Higher Education but it has become a serious thread to to Intellectual Property Protection. However, current source code plagiarism detection tools are of limited use if the plagiarism is actively disguised through obfuscation. This project will research how obfuscation impacts current detection approaches by doing a large-scale evaluation of clone and plagiarism detectors.

  • The project will first research and develop a tool to apply a range of automatic obfuscations like reformatting and identifier renaming to source code.
  • The project will then use the automatic obfuscation to study how likely different plagiarism and clone detectors can still detect the obfuscated copy as plagiarised.

Material:

  • Zhang, F., Jhi, Y. C., Wu, D., Liu, P., & Zhu, S. (2012, July). A first step towards algorithm plagiarism detection. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (pp. 111-121). ACM.

Mutation-Based Obfuscation

Obfusctaion is usually done by applying transformations to programs that change their syntax, but not their semantics. Guranteeing that transformations are semantics preserving is quite hard. This project will used mutation-based testing transformation which specifically try to change the semantics. To ensure that the mutations don't change the behaviour, automated testing will ensure that only mutation that don't change the behaviour are accepted.
  • This project will have to use mutation test tools and automatic test generation for Java.
  • Some serach-based approach may be used to find a set of mutations that minimise the textual similarity between the original and the obfuscated version.

Detection of Changes in Public Interfaces

Changes to the source code of a system are easily identifiable by a textual comparison of the current to the previous version. The detection of semantic or syntactic changes is much more complicated. Independent of the changes in the code of a system, it is often more interesting to identify the changes in the public interfaces of a system. This project has to develop an approach to compare the public interfaces of a system's current version to the public interfaces of the previous version. This approach has to identify how the interfaces has changed and to report to changes.

Risks of Data Based Taint Analysis

Dynamic application instrumentation and monitoring has become so efficient that dynamic taint analyses tools have been built which track user data through the execution of application programs. To prevent taint explosion, such tools ignore tainting through control instructions. However, control instructions can be used to copy information and thus, such tools will have false negatives. This project will study such risks by evaluating a tool like privacy scope with indirect channels. For example, a loop like the following will copy variable a to b without ever creating a data dependence between a and b: b = 0; i = 0; while (i != a) { ++i; ++b}

Material:

  • http://appanalysis.org/privacyscope/index.html

Information Flow Control and Taint Analysis via Program Slicing for Java

Program Slicing uses dependence information that exists between statements to find if a statement may influence a different statement. Recently, slicing-based techniques have been used for information flow control and taint analysis. Unlike traditional approaches, the slicing-based techniques can be applied to real-world programs effectively.
  • This project will develop such a technique to be applied to Java programs. The project will take a specification of sources and sinks and will check how different levels of security can flow from the sources to the sink.
  • The project will use the WALA infrastructure (T. J. Watson Libraries for Analysis) which provides static analysis capabilities for Java bytecode and related languages.

Material:

  • http://http://wala.sourceforge.net/

Automatic detection of non-determinism in programs.

Testing of programs is made more complicated by non-deterministic behaviour. Most often, this is caused by concurrent execution. However, programs can have non-deterministic behaviour even when they are single threaded. For example, such behaviour can be caused by explicitly undefined behaviour in programming languages like C that are exposed by using different compilers.

This project will use dynamic instrumentation to create traces of executions of the same program with the same input in different contexts. The traces are then compared to spot the locations where they diverge. Once such a divergence is found, it will be mapped back to source code and reported as a potential problem.

Modifying Execution Behaviour

To observe and study the behaviour of a program, it is sometimes necessary to disable the execution of certain statements or instructions. This project will develop an approach to dynamically modify the execution in the described way that allows the modification to be applied without recompiling the system.

This project can be done for Java, in which case the bytecode has to be rewritten, or for C, in which case the binary code has to be rewritten.

Obfuscation in Android Apps

Obfuscation is used to protect IP and to hide malicious activities. This project will analyse how Android Apps currently use obfuscation:

  • Which obfuscation tools are used?
  • Has the obfuscation changed over time?
  • Does obfuscation differ between benign and malicious apps?

 

Last modified: 07/18/2017