Abstract

Codes of Open Source Software (OSS) are widely reused during software development nowadays. However, reusing some specific versions of OSS introduces 1-day vulnerabilities of which details are publicly available, which may be exploited and lead to serious security issues. Existing state-of-the-art OSS reuse detection work can not identify the specific versions of reused OSS well. The features they selected are not distinguishable enough for version detection and the matching scores are only based on similarity.This paper presents B2SMatcher, a fine-grained version identification tool for OSS in commercial off-the-shelf (COTS) software. We first discuss five kinds of version-sensitive code features that are trackable in both binary and source code. We categorize these features into program-level features and function-level features and propose a two-stage version identification approach based on the two levels of code features. B2SMatcher also identifies different types of OSS version reuse based on matching scores and matched feature instances. In order to extract source code features as accurately as possible, B2SMatcher innovatively uses machine learning methods to obtain the source files involved in the compilation and uses function abstraction and normalization methods to eliminate the comparison costs on redundant functions across versions. We have evaluated B2SMatcher using 6351 candidate OSS versions and 585 binaries. The result shows that B2SMatcher achieves a high precision up to 89.2% and outperforms state-of-the-art tools. Finally, we show how B2SMatcher can be used to evaluate real-world software and find some security risks in practice.

Highlights

  • During the modern software development process, developers often use the rich functions provided by open source software (OSS) to shorten the development cycle, spending more time on personalized development

  • The aim of this paper is to find the specific versions of oss1, oss2, oss3

  • We explore a large number of realworld software and find that some commercial software like Teamviewer (TeamViewer 2020) and Zoom (Zoom 2020) reuse vulnerable OSS versions

Read more

Summary

Introduction

During the modern software development process, developers often use the rich functions provided by open source software (OSS) to shorten the development cycle, spending more time on personalized development. There are over 44 million repositories on Github (Repo Statistics on Github 2020). Such a large amount of OSS has brought great convenience to software development. Improper use of OSS can cause (2021) 4:21. Vulnerabilities appear in specific versions of OSS. By correctly identifying the version of reused OSS in COTS software, we can get the release time and determine whether it is an outdated version, and check whether this version is a vulnerable version. The aim of this paper is to implement a fine-grained version identification tool for OSS reused in COTS binary files

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call