DEPARTMENT OF COMPUTER SCIENCE
Dr. Jens Krinke
UCL

Home
- Contact
- Teaching
- Research
- Publications

Student Projects

General Ideas

My areas of interest are Software Test, Continuous Integration, Code Review, Program Analysis, Clone Detection, Bug Detection. If you are interested to do a project in one of the areas, please contact me.

A list of concrete topics you'll find below.

Specific Topics

Toxic Code on Stack Overflow

Developers often copy and paste code snippets from Stack Overflow when they solve difficult programming tasks, due to its large number of code examples. Our initial study of the origin of Stack Overflow Java code snippets shows that some code snippets in Stack Overflow answers are cloned from open source projects or public websites. Several cloned snippets suffer from being outdated and potentially harmful for reuse. For example, we found an outdated cloned snippet from Hadoop, a popular distributed MapReduce framework, in a Stack Overflow answer, which introduces a software defect and is still currently available in 500 open source software projects in GitHub. Moreover, a number of cloned code snippets in Stack Overflow answers violate the license of their original software and may cause legal ramifications if being reused. We call these outdated (and potentially defective) and license-violating code snippets “toxic code snippets.” To generalize the findings and gain more insights into the issues of toxic code snippets on Stack Overflow, we will extend the study beyond Java to the two most popular programming languages: Python and C/C++. The research methodology includes performing an automated code clone detection, manual clone investigation, and automated licensing analysis. Code clone detection, using state-of-the-art tools, will be performed between Stack Overflow code snippets and open source projects written in Python and C/C++. The clones will be manually checked for their origins. Finally, the clones that are originated from open source projects and public websites will be checked for outdated code and for software licensing violations. The findings from this study will be compared to the previous findings of toxic Java code snippets on Stack Overflow. The results from the study will encourage the software engineering community, for both research and industry, to be careful on reusing code snippets to/from Stack Overflow.

What makes CI (Continuous Integration) fail and how do developers react it?

CI during Code Review benefits developers to detect defects that do not satisfy functional requirements (i.e. compile & testing) in early stage. If a CI fails, there is strong evidence that a recent change either broke a build or failed to pass test cases. In our recent study, on the other hand, we found that CIs even fails without a mistake by a developer (e.g. due to flaky tests). Therefore, the goal of this study is figuring out what are the main reasons of CI failure during Code Review and how developers react to the failure.

Related code review comment recommendation

Our recent work on the related code review recommendation technique compares similarity between newly submitted code review request and previously reviewed code review request. Then, it recommends the most similar code review requests which may contain useful information for the new code review request. However, it still takes time for developers to locate the exact useful information. Therefore, we come up with a finer granularity technique to recommend related comments. The main idea of this technique is splitting a newly submitted code review request into multiple hunks and compare the similarity with the hunks from previously submitted code review requests which have developers' comment. This technique can provide useful information for a code review at the finer granularity.

Code Smells Refactoring During Code Review

This project aims to leverage the rich information available in code review data to expand the empirical knowledge around the introduction and refactoring of code smells on software systems. For each system and each code review, we will revert the system back to its state before the code review and subsequently apply each patch submitted during the review until the changed is finally merged into the repository. In each of these steps, we will automatically detect the introduction and/or removal of code smells in the system, alongside the refactoring operations that might have been performed by each patch within the review. The code review process generates rich information for each review, which includes the natural language description of the review and the natural language feedback for each patch being submitted during the review. This would allow us ask interesting questions such as: What are common intents when developers introduce code smells to the system? What are common intents when developers remove code smells from the system? How often do developers discuss the introduction and/or removal of code smells during code review? What refactoring operations are used when developers have the intent of refactoring the system? How often do developers succeed in removing code smells when they have the intent of refactoring the system?

Are Developers Aware of Clones When They Make Code Changes?

This projects aim to leverage the rich information in code review process and automatic code clone detection to assess the awareness of developers to code cloning on a day-to-day basis. Code clone detection will be performed before and after each code review patch is submitted. With detected clones, code changes from the patch, and natural language text in the review, we can ask several interesting questions by performing a manual analysis of the results: (1) How often are developers aware, i.e. do they discuss them in code review, when clones are introduced into the systems? (2) When they discuss about clones, what happens to the clones in the system after the discussion? (3) What are common intents when developers introduce clones into the software? (4) What is the developer's perception about code clones? Should clones always be removed or it's just good to know they're there?

Finding and Exploiting Similarity between Software Tests

With Unit Testing one has a link between the tested code and the code that tests it. This project will investigate if similar test code corresponds to the tested code to be similar, too (and vice versa). If such a correspondence exists, it may be possible to exploit the similarity to reuse test code for code that is not yet tested but has some similar code that is already tested.

Cross-Referencing for Deletion Dependence

In the ORBS Project, programs are manipulated by deleting statements and source code lines as much as possible. Very often, a deletion is attempted, but the resulting source code cannot even be compiled. This project will build a referencing tool that will extract links between lines from the analysed source code. These links will establish that a line x can only be deleted when a line y already has been deleted. The extracted links will be used to improve the ongoing ORBS research project.

Automatic Removal of Unused Code

During Testing of a system, coverage of the code is often measured to find out which code has been executed. If a test fails, there are often tools at hand that help with finding problems inside the system, however, most of them are static analysis tools that analyse the complete source code. This project will develop a tool that automatically removes source code that is not executed during a test. An approach called delta debugging can be used for it: It removes parts of the source code and checks if the tests executed still produce the same output as before. Delta debugging will find the minimal source code that will still produce the same test results. The reduced source code can than be used for much more effective fault localisation or other static analyses.

 

Last modified: 10/03/2017