Unscrambling code clones for one-to-one matching of duplicated code
Code clone detection tools find sections of code that are similar. Different tools use difference representations of the code and different matching algorithms. This diversity makes clone detection tools attractive for other code matching tasks, particularly where code has been edited or rearranged. However, the tools report on every match found. In some applications we are interested in one-to-one matching, meaning that each section of copied code in one file is matched to just one section of code in the other file. In this report we explore ways that clones reported by the detection tools can inflate the amount of matching code. We also explain, with the aid of a worked example, our method for unscrambling the output from clone detection tools to approximate one-to-one matching if the code in one file to that in another file.