An Extended Benchmark System of Word Embedding Methods for Vulnerability Detection

Hai Nguyen Ngoc,Hoang Nguyen Viet,Tetsutaro Uehara

doi:10.1145/3440749.3442661

Abstract

Security researchers have used Natural Language Processing (NLP) and Deep Learning techniques for programming code analysis tasks such as automated bug detection and vulnerability prediction or classification. These studies mainly generate the input vectors for the deep learning models based on the NLP embedding methods. Nevertheless, while there are many existing embedding methods, the structures of neural networks are diverse and usually heuristic. This makes it difficult to select effective combinations of neural models and the embedding techniques for training the code vulnerability detectors. To address this challenge, we extended a benchmark system to analyze the compatibility of four popular word embedding techniques with four different neural networks, including the standard Bidirectional Long Short-Term Memory (Bi-LSTM), the Bi-LSTM applied attention mechanism, the Convolutional Neural Network (CNN), and the classic Deep Neural Network (DNN). We trained and tested the models by using two types of vulnerable function datasets written in C code. Our results revealed that the Bi-LSTM model combined with the FastText embedding technique showed the most efficient detection rate on a real-world but not on an artificially constructed dataset. Further comparisons with the other combinations are also discussed in detail in our result.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Extended Benchmark System of Word Embedding Methods for Vulnerability Detection

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

AI-Generated Spam Review Detection Framework with Deep Learning Algorithms and Natural Language Processing
Mudasir Ahmad Wani ... Kashish Ara Shakil
Computers | VOL. 13
Mudasir Ahmad Wani, et. al.Mudasir Ahmad Wani ... Kashish Ara Shakil
12 Oct 2024
Computers | VOL. 13

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages
Jurgita Kapočiūtė-Dzikienė ... Senait Gebremichael Tesfagergish
Information Technology And Control | VOL. 49
Jurgita Kapočiūtė-Dzikienė, et. al.Jurgita Kapočiūtė-Dzikienė ... Senait Gebremichael Tesfagergish
19 Dec 2020
Information Technology And Control | VOL. 49

Machine learning and deep learning-based approach to categorize Bengali comments on social networks using fused dataset.
Khandaker Mohammad Mohi Uddin ... Md Ashraf Uddin
PloS one | VOL. 19
Khandaker Mohammad Mohi Uddin, et. al.Khandaker Mohammad Mohi Uddin ... Md Ashraf Uddin
01 Jan 2024
PloS one | VOL. 19

Comparative Analysis of Deep Learning Approaches for Twitter Text Classification
Lukesh Kadu
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 06
Lukesh KaduLukesh Kadu
21 Oct 2022
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 06

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Extended Benchmark System of Word Embedding Methods for Vulnerability Detection

Abstract

Talk to us

Similar Papers