Abstract

Deciding whether two code fragments are semantic clones, or type-4 clones, is a problem with many ramifications. Current research often focuses on the problem in an imperative or object-oriented setting and most existing work uses abstract syntax trees, program dependency graphs, program metrics or text-based, token-based and machine learning-based approaches to identify semantic clones. In this work, we adopt a fundamentally different point of view and express clone detection as a search problem in a logic programming setting. Due to their restricted syntax and semantics, (constraint) logic programs are by nature simple and elegant candidates for automated analysis. After having formalized the clone detection problem at the level of predicates, we develop a study of the different parameters that come into play in the resulting framework. We try and identify the complexity issues involved in a general semantic clone detection procedure that essentially computes so-called most specific generalizations for predicates written in constraint logic programming (CLP). Even though well-known for basic structures such as literals and terms, generalization (or anti-unification) of more complex structures such as clauses and predicates has received very little attention. We show that the anti-unification allows both to control the search and guide the detection of cloned predicates. We pinpoint where efficient approximations are needed in order to be able to identify semantic code clones in a manageable time frame.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call