Abstract

Software vulnerability has long been an important but critical research issue in cybersecurity. Recently, the machine learning (ML)-based approach has attracted increasing interest in the research of software vulnerability detection. However, the detection performance of existing ML-based methods require further improvement. There are two challenges: one is code representation for ML and the other is class imbalance between vulnerable code and nonvulnerable code. To overcome these challenges, this article develops a DeepBalance system, which combines the new ideas of deep code representation learning and fuzzy-based class rebalancing. We design a deep neural network with bidirectional long short-term memory to learn invariant and discriminative code representations from labeled vulnerable and nonvulnerable code. Then, a new fuzzy oversampling method is employed to rebalance the training data by generating synthetic samples for the class of vulnerable code. To evaluate the performance of the new system, we carry out a series of experiments in a real-world ground-truth dataset that consists of the code from the projects of LibTIFF, LibPNG, and FFmpeg. The results show that the proposed new system can significantly improve the vulnerability detection performance. For example, the improvement is 15% in terms of F-measure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.