Approach to Predict Software Vulnerability Based on Multiple-Level N-gram Feature Extraction and Heterogeneous Ensemble Learning

Bing Zhang,Jiadong Ren,Jingyi Wu,Qian Wang,Ning Wang,Yuan Gao

doi:10.1142/s0218194022500620

Abstract

Software vulnerabilities are one of the roots of computer security problems. The traditional static analysis and dynamic analysis methods based on software source code mainly have some deficiencies, such as high false positive rate, high false negative rate and insufficient semantic information captured. Nevertheless, the application of machine learning, Natural Language Processing and other technologies in software vulnerability prediction can effectively mitigate such issues. This paper proposed a vulnerability prediction method based on multiple-level N-gram feature extraction and heterogeneous ensemble learning. First, by code intermediate representation and constructing a multiple-level N-gram feature generation model, two kinds of N-gram semantic features with different window size and different granularity at word and char level were extracted to retain the semantic and structural information of code. Second, TF–IDF was used to construct the vector space model as the input of prediction model. As a single classifier was prone to overfitting and poor generalization, this paper conducted benchmark testing on five classical machine learning algorithms (NB, SVM, DT, LR, RF), and then combined four (SVM, DT, LR, RF) among them, which had better performance as the base classifiers to form the stacking heterogeneous ensemble method to build the vulnerability prediction model. Finally, the proposed method was verified on buffer overflow vulnerability and resource management vulnerability datasets, with a lowest false positive rate and false negative rate which can reach 1.58% and 4.06%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Approach to Predict Software Vulnerability Based on Multiple-Level N-gram Feature Extraction and Heterogeneous Ensemble Learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Software Engineering and Knowledge Engineering

Lead the way for us

Journal: International Journal of Software Engineering and Knowledge Engineering	Publication Date: Oct 1, 2022
Citations: 1

Similar Papers

Validation of Brief Screening Tools for Mental Disorders Among New Zealand Prisoners
C. Evans ... A. I. Simpson
Psychiatric Services | VOL. 61
C. Evans, et. al.C. Evans ... A. I. Simpson
01 Sep 2010
Psychiatric Services | VOL. 61

Prediction of software vulnerability based deep symbiotic genetic algorithms: Phenotyping of dominant-features
Canan Batur Şahin ... Özlem Batur Dinler
Applied Intelligence | VOL. 51
Canan Batur Şahin, et. al.Canan Batur Şahin ... Özlem Batur Dinler
31 Mar 2021
Applied Intelligence | VOL. 51

Research on Key Data Structure Localization Technology of Buffer Overflow Vulnerability
Hui Guo ... Jian-Ping Hu
-
Hui Guo, et. al.Hui Guo ... Jian-Ping Hu
27 Apr 2018
27 Apr 2018

A deep learning based static taint analysis approach for IoT software vulnerability location
Weina Niu ... Mohsen Guizani
Measurement | VOL. 152
Weina Niu, et. al.Weina Niu ... Mohsen Guizani
16 Oct 2019
Measurement | VOL. 152

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Approach to Predict Software Vulnerability Based on Multiple-Level N-gram Feature Extraction and Heterogeneous Ensemble Learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Software Engineering and Knowledge Engineering