Abstract
With the development of the open-source movement, third-party library reuse is commonly practiced in programming. Application developers can reuse the code to save time and development costs. However, there are some hidden risks in misusing third-party libraries such as license violation and security vulnerability. The identification of libraries written in C or C++ is impeded by compilation process which hides most features of code. The same open-source package can be compiled into different binary code by different compilation processes. Therefore, this paper proposes LibDX, a platform-independent and fully-automated system, to detect reused libraries in binary files. With a well-designed feature extractor, LibDX can overcome compilation diversity between binary files. LibDX novelly introduces the logic feature block concept which is applied to deal with the feature duplication challenge in a large-scale feature database. We built a large test data set covering multiple platforms and evaluated LibDX with 9.5K packages including 25.8K C/C++ binary files. Our results show that LibDX achieves a precision of 92% and a recall of 97%, and outperforms state-of-the-art tools. We have validated the performance of the system with closed source commercial applications and found some license violation cases.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have