Abstract

With the development of the open-source movement, third-party library reuse is commonly practiced in programming. Application developers can reuse the code to save time and development costs. However, there are some hidden risks in misusing third-party libraries such as license violation and security vulnerability. The identification of libraries written in C or C++ is impeded by compilation process which hides most features of code. The same open-source package can be compiled into different binary code by different compilation processes. Therefore, this paper proposes LibDX, a platform-independent and fully-automated system, to detect reused libraries in binary files. With a well-designed feature extractor, LibDX can overcome compilation diversity between binary files. LibDX novelly introduces the logic feature block concept which is applied to deal with the feature duplication challenge in a large-scale feature database. We built a large test data set covering multiple platforms and evaluated LibDX with 9.5K packages including 25.8K C/C++ binary files. Our results show that LibDX achieves a precision of 92% and a recall of 97%, and outperforms state-of-the-art tools. We have validated the performance of the system with closed source commercial applications and found some license violation cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call