Over the past decade, Open Source Software (OSS) has experienced rapid growth and widespread adoption, attributed to its openness and editability. However, this expansion has also brought significant security challenges, particularly introducing and propagating software vulnerabilities. Despite the use of machine learning and formal methods to tackle these issues, there remains a notable gap in comprehensive surveys that summarize and analyze both vulnerability detection (VD) and security patch detection (SPD) in OSS. This article seeks to bridge this gap through an extensive survey that evaluates 127 technical studies published between 2014 and 2023, structured around the Vulnerability-Patch life cycle. We begin by delineating the six critical events that constitute the Vulnerability-Patch life cycle, leading to an in-depth exploration of the Vulnerability-Patch Ecosystem. We then systematically review the databases commonly used in VD and SPD, and analyze their characteristics. Subsequently, we examine existing VD methods, focusing on traditional and deep learning-based approaches. Additionally, we organize current security patch identification methods by kernel type and discuss techniques for detecting the presence of security patches. Based on our comprehensive review, we identify open research questions and propose future research directions that merit further exploration.
Read full abstract