VulHunter: An Automated Vulnerability Detection System Based on Deep Learning and Bytecode

Ning Guo,Yali Gao,Hui Yin,Xiaoyong Li

doi:10.1007/978-3-030-41579-2_12

Abstract

The automatic detection of software vulnerability is undoubtedly an important research problem. However, existing solutions heavily rely on human experts to extract features and many security vulnerabilities may be missed (i.e., high false negative rate). In this paper, we propose a deep learning and bytecode based vulnerability detection system called Vulnerability Hunter (VulHunter) to relieve human experts from the tedious and subjective task of manually defining features. To the best of knowledge, we are the first to leverage bytecode features to represent vulnerabilities. VulHunter uses the bytecode, which is the intermediate representation output by the source code, as input to the neural networks and then calculate the similarity between the target program and vulnerability templates to determine whether it is vulnerable. We detect SQL injection and Cross Site Scripting (XSS) vulnerabilities in PHP software to evaluate the effectiveness of VulHunter. Experimental results show that VulHunter achieves more than 88% (SQL injection) and 95% (XSS) F1-measure when detecting a single type of vulnerability, as well as more than 90% F1-measure when detecting mixed types of vulnerabilities. In addition, VulHunter has lower false positive rate (FPR) and false negative rate (FNR) than existing approaches or tools. In practice, we apply VulHunter to three real PHP software (SEACMS, ZZCMS and CMS Made Simple) and detect five vulnerabilities in which three have not been disclosed before.

Full Text