Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning

Xin Li,Lu Wang,Yuling Chen,Yang Xin,Yixian Yang

doi:10.3390/app10051692

Abstract

Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples. This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning. First, the sample in the form of source code is transformed into a minimum intermediate representation to exclude the irrelevant items and reduce the length of the dependency. Next, the intermediate representation is transformed into a real value vector through pre-training on an extended corpus, and the structure and semantic information are retained. Then, the vector is fed to three concatenated convolutional neural networks to obtain high-level features of vulnerability. Last, a classifier is trained using the learned features. To validate this vulnerability detection method, an experiment was performed. The empirical results confirmed that compared with the traditional methods and the state-of-the-art intelligent methods, our method has a better performance with fine granularity.

Highlights

The importance of cyberspace security has become more and more significant
In order to compare the effects of different classifiers at the stage of vulnerability detection, we used the learned high-level features to train six classifiers: Logistic Regression (LR), Naive Bayesian (NB), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Gradient Boosting Decision Tree (GBDT), Random Forest (RF)
Compared with VulDeePecker, our method improved by 1.4% in False Positive Rate (FPR), 8.4% in False Negative Rate (FNR), 4.0% in P, 7.6% in R, and 6.4% in F1

Summary

Introduction

The importance of cyberspace security has become more and more significant. cyberspace is facing a serious threat of invasion. Intelligent vulnerability detection methods which operate on source code are one of the main research directions It can be categorized into 3 types: using software engineering metrics, anomaly detection, and vulnerable pattern learning [11]. Due to the particularity of vulnerability, the lack of a general and authoritative vulnerability dataset for training and testing still limits the performance of intelligent methods To overcome these challenges, we proposed a framework that detects software vulnerabilities in four stages: pre-processing, pre-training, representation learning, and classifier training. In the pre-training stage, considering the lack of vulnerability samples, we conduct unsupervised learning on an extended corpus The purpose of this process is to learn the common syntax features of program language and alleviate the OoV issue through distributed embedding.

Intelligent Vulnerability Detection

Program Understanding Model

Motivating Examples

12. It is data-dependent line 5 and dependent control dependent

Hypothesis

Proposed Approach

Pre-Processing

Pre-Training

High-Level

Building Models and Performing Vulnerability Detection

Evaluation

Comparison of Different Neural Networks

Effectiveness of Pre-Training

Method

Comparative Analysis

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: Mar 2, 2020
Citations: 50	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Static vulnerability detection based on class separation
Chunyong Zhang ... Yang Xin
The Journal of Systems & Software | VOL. 206
Chunyong Zhang, et. al.Chunyong Zhang ... Yang Xin
09 Sep 2023
The Journal of Systems & Software | VOL. 206

AVDHRAM: Automated Vulnerability Detection based on Hierarchical Representation and Attention Mechanism
Wenyan An ... Dan Meng
-
Wenyan An, et. al.Wenyan An ... Dan Meng
01 Dec 2020
01 Dec 2020

Automated Software Vulnerability Detection Based on Hybrid Neural Network
Xin Li ... Qifeng Tang
Applied sciences | VOL. 11
Xin Li, et. al.Xin Li ... Qifeng Tang
02 Apr 2021
Applied sciences | VOL. 11

Vulnerability Feature Extraction Model for Source Code Based on Deep Learning
Zhengyuan Wang ... Haonan Li
-
Zhengyuan Wang, et. al.Zhengyuan Wang ... Haonan Li
01 Sep 2021
01 Sep 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences