Abstract

Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples. This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning. First, the sample in the form of source code is transformed into a minimum intermediate representation to exclude the irrelevant items and reduce the length of the dependency. Next, the intermediate representation is transformed into a real value vector through pre-training on an extended corpus, and the structure and semantic information are retained. Then, the vector is fed to three concatenated convolutional neural networks to obtain high-level features of vulnerability. Last, a classifier is trained using the learned features. To validate this vulnerability detection method, an experiment was performed. The empirical results confirmed that compared with the traditional methods and the state-of-the-art intelligent methods, our method has a better performance with fine granularity.

Highlights

  • The importance of cyberspace security has become more and more significant

  • In order to compare the effects of different classifiers at the stage of vulnerability detection, we used the learned high-level features to train six classifiers: Logistic Regression (LR), Naive Bayesian (NB), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Gradient Boosting Decision Tree (GBDT), Random Forest (RF)

  • Compared with VulDeePecker, our method improved by 1.4% in False Positive Rate (FPR), 8.4% in False Negative Rate (FNR), 4.0% in P, 7.6% in R, and 6.4% in F1

Read more

Summary

Introduction

The importance of cyberspace security has become more and more significant. cyberspace is facing a serious threat of invasion. Intelligent vulnerability detection methods which operate on source code are one of the main research directions It can be categorized into 3 types: using software engineering metrics, anomaly detection, and vulnerable pattern learning [11]. Due to the particularity of vulnerability, the lack of a general and authoritative vulnerability dataset for training and testing still limits the performance of intelligent methods To overcome these challenges, we proposed a framework that detects software vulnerabilities in four stages: pre-processing, pre-training, representation learning, and classifier training. In the pre-training stage, considering the lack of vulnerability samples, we conduct unsupervised learning on an extended corpus The purpose of this process is to learn the common syntax features of program language and alleviate the OoV issue through distributed embedding.

Intelligent Vulnerability Detection
Program Understanding Model
Motivating Examples
12. It is data-dependent line 5 and dependent control dependent
Hypothesis
Proposed Approach
Pre-Processing
Pre-Training
High-Level
Building Models and Performing Vulnerability Detection
Evaluation
Comparison of Different Neural Networks
Effectiveness of Pre-Training
Method
Comparative Analysis
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call