Dr. Jens Krinke

- Contact
- Teaching
- Research
- Publications

Student Projects

General Ideas

My areas of interest are Software Test, Continuous Integration, Code Review, Program Analysis, Clone Detection, Bug Detection. If you are interested to do a project in one of the areas, please contact me.

A list of concrete topics you'll find below.

Specific Topics

Clone Search for JavaScript, C/C++, C#

We have developed a search engine that can find Java or Python code snippets on the web that are similar to a given Java code snippet. This code search engine uses concepts from Information Retrieval which have been adapted to search for source code instead of natural language text.

This project will adapt the search engine to allow searching for code snippets in languages different to Java or Python. The student working on the project can pick any language that is to be searched for. However, the project itself has to be programmed in Java. It involves parsing source code in the selected language and harvesting source code from Stack Overflow and/or GitHub. The student will have to work with Apache Elasticsearch.

Article describing the search engine for Java.

Toxic Code on Stack Overflow

Developers often copy and paste code snippets from Stack Overflow when they solve difficult programming tasks, due to its large number of code examples. Our initial study of the origin of Stack Overflow Java code snippets shows that some code snippets in Stack Overflow answers are cloned from open source projects or public websites. Several cloned snippets suffer from being outdated and potentially harmful for reuse. For example, we found an outdated cloned snippet from Hadoop, a popular distributed MapReduce framework, in a Stack Overflow answer, which introduces a software defect and is still currently available in 500 open source software projects in GitHub. Moreover, a number of cloned code snippets in Stack Overflow answers violate the license of their original software and may cause legal ramifications if being reused. We call these outdated (and potentially defective) and license-violating code snippets "toxic code snippets."

To generalize the findings and gain more insights into the issues of toxic code snippets on Stack Overflow, we will extend the study beyond Java to the two most popular programming languages: Python and C/C++. The research methodology includes performing an automated code clone detection, manual clone investigation, and automated licensing analysis. Code clone detection, using state-of-the-art tools, will be performed between Stack Overflow code snippets and open source projects written in Python and C/C++. The clones will be manually checked for their origins. Finally, the clones that are originated from open source projects and public websites will be checked for outdated code and for software licensing violations. The findings from this study will be compared to the previous findings of toxic Java code snippets on Stack Overflow. The results from the study will encourage the software engineering community, for both research and industry, to be careful on reusing code snippets to/from Stack Overflow.

Article describing the original study.

Are Developers Aware of Clones When They Make Code Changes?

This projects aim to leverage the rich information in code review process and automatic code clone detection to assess the awareness of developers to code cloning on a day-to-day basis. Code clone detection will be performed before and after each code review patch is submitted. With detected clones, code changes from the patch, and natural language text in the review, we can ask several interesting questions by performing a manual analysis of the results: (1) How often are developers aware, i.e. do they discuss them in code review, when clones are introduced into the systems? (2) When they discuss about clones, what happens to the clones in the system after the discussion? (3) What are common intents when developers introduce clones into the software? (4) What is the developer's perception about code clones? Should clones always be removed or it's just good to know they're there?

Article describing a similar study.

Cross-Referencing for Deletion Dependence

In the ORBS Project, programs are manipulated by deleting statements and source code lines as much as possible. Very often, a deletion is attempted, but the resulting source code cannot even be compiled. This project will build a referencing tool that will extract links between lines from the analysed source code. These links will establish that a line x can only be deleted when a line y already has been deleted. The extracted links will be used to improve the ongoing ORBS research project.

Article describing the underlying system.

Code-to-Test Traceability and Automatic Fault Localisation for Python

Code-to-test traceability establishes links between code and the tests testing the code. Establishing such links is not straight forward and multiple approaches have been developed. This project will have to establish such links by tracing a program while it is being tested. The tracing will be combined with automatic fault localisation, a technique to find the code that is likely responsible to cause a test to fail.

As a similar project has been done in the past, this project will have to research and develop for Python code.

Coding Style Compliance on Stack Overflow

UCL students have investigated coding style compliance for Python code on Stack Overflow. This project is a follow-up and an extension to the previous work. The main focus will be on studying style compliance of JavaScript code on Stack Overflow and compare the results to the previous results for Python.

Article describing the original study.


Last modified: 11/11/2019