Abstract

In this study, we propose a method to quickly search for similar source files for a given source file as a method to examine the origin of reused code. By outputting not only the same contents but also similar contents, it corresponds to the source file that has been changed during reuse. In addition, locality-sensitive hashing is used to search from a large number of source files, enabling fast search. By this method, it is possible to know the origin of the reused code. A case study was conducted on a library that is being reused written in C language. Some of the changes were unique to the project, and some were no longer consistent with the source files. As a result, it was possible to detect the source files that were reused from among the 200 projects with 92% accuracy. In addition, when we measured the execution time of the search using 4 files, the search was completed within 1 second for each file.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.