Clone detection finds application to many software engineering activities such as comprehension and refactoring. However, the confounding configuration choice problem poses a widely-acknowledged threat to the validity of previous empirical analyses. We introduce a search based solution that finds suitable configurations for empirical studies. We introduce both a desktop and parellised cloud-deployed implementations, using them to evaluate our approach on 6 popular clone detection tools applied to the Bellon suite of 8 subject systems. Our evaluation reports the results of 9.3 million total executions of a clone tool, which required a total of 15 CPU-years of computation time. It is the largest empirical study of clone detection to date. Our approach finds configurations that are significantly better (p < 0.05) than the defaults currently used in clone detection experiments, thereby providing evidence that our approach can ameliorate the confounding configuration choice problem.
This website was created to accompany our FSE 2013 submission which is
currently under review. In its present form it only provides results from our
analysis (that could not be fitted into the page limits of the paper).
The following figures show the agreement levels for the all the subject
systems in Bellon's benchmarks achieved by the Default, General and Individual
General Clone Format (GCF) was designed to cater to anticipated development in the clone community, which is currently focussing on so-called `gapped clones'. We also develop a GCF Converter to covert the output of other clone detection tools to GCF files.
GCF file converter inputs:
- subject_name: The name of the subject program
- subject_path: The path of the subject program
- clone_file_path: The path of the input clone file
- min_line: The minimum lines of each clone
- RCF files (IClone): java -jar GCF_Fileconverters.jar 5 clone_file_path minline
- CCfinder: java -jar GCF_Fileconverters.jar 1 clone_file_path minline
- PMD/CPD: java -jar GCF_Fileconverters.jar 3 subject_path clone_file_path minline
- ConQAT: java -jar GCF_Fileconverters.jar 4 subject_name clone_file_path minline
- Simian: java -jar GCF_Fileconverters.jar 6 subject_path clone_file_path minline
- NiCAD: java -jar GCF_Fileconverters.jar 7 subject_name clone_file_path minline