Abstract

Precision and recall are the main metrics used to measure the correctness of clone detectors. These metrics require the existence of labeled datasets containing the ground truth – samples of clone and non-clone pairs. For source code clone detectors, in particular, there are some techniques, as well as a concrete framework, for automatically evaluating recall, down to different types of clones. However, evaluating precision is still challenging, because of the intensive and specialized manual effort required to accomplish the task. Moreover, when precision is reported, it is typically done over all types of clones, making it hard to assess the strengths and weaknesses of the corresponding clone detectors.This paper presents systematic experiments to evaluate precision of eight code clone detection tools. Three judges independently reviewed 12,800 clone pairs to compute the undifferentiated and type-based precision of these tools. Besides providing a useful baseline for future research in code clone detection, another contribution of our work is to unveil important considerations to take into account when doing precision measurements and reporting the results. Specifically, our work shows that the reported precision of these tools leads to significantly different conclusions and insights about the tools when different types of clones are taken into account. It also stresses, once again, the importance of reporting inter-rater agreement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.