A mapping study of software code cloning
Background: Software Code Cloning is widely used by developers to produce code in which they have confidence and which reduces development costs and improves the software quality. However, Fowler and Beck suggest that the maintenance of clones may lead to defects and therefore clones should be re-factored out. Objective: We investigate the purpose of code cloning, the detection techniques developed and the datasets used in software code cloning studies between the years of 2007 and 2011. This is to analyse the current research trends in code cloning to try and find techniques which have been successful in identifying clones used for defect prediction. Method: We used a mapping study to identify 220 software code cloning studies published from January 2007 to December 2011. We use these papers to answer six research questions by analysing their abstracts, titles and reading the papers themselves. Results: The main focus of studies is the technique of software code clone detection. In the past four years the number of studies being accepted at conferences and in journals has risen by 71%. Most datasets are only used once, therefore the performance reported by one paper is not comparable with the performance reported by another study. Conclusion: The techniques used to detect clones seem to be the main focus of studies. However it is difficult to compare the performance of the detection tools reported in different studies because the same dataset is rarely used in more than one paper. There are few benchmark datasets where the clones have been correctly identified. Few studies apply code cloning detection to defect prediction.