Abstract

Effective software vulnerability detection is paramount for ensuring the security of software systems. However, the presence of imbalanced data in extensive datasets often leads to overfitting on non-vulnerable code and suboptimal performance on vulnerable code. Traditional methods of collecting vulnerable data frequently fall short in capturing the complexities of real-world scenarios. This paper proposes a mutation-based data enhancement approach to tackle this challenge, with a focus on capturing essential traits of vulnerable source code. Our approach systematically extracts traits from extensive vulnerable source code and employs mutation operators to introduce high-level alterations. We evaluate the convergence of multiple mutation rounds using a diversity index to ensure consistent enhancements. By selecting the most effective mutation operators for subsequent model training, our approach achieves substantial accuracy improvements across diverse datasets and deep neural network models. This work represents the initial version of our approach, with continuous refinements underway to facilitate practical implementation and address real-world security challenges.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.