Automatic Identification of Vulnerable Code: Investigations with an AST-Based Neural Network

Garrett Partenza,Suranjan Chakraborty,Josh Dehlinger,Trevor Amburgey,Lin Deng

doi:10.1109/compsac51774.2021.00219

Abstract

The increasing complexity of software applications and the necessity for minimizing software vulnerabilities has given rise to the use of machine learning techniques that can identify software vulnerabilities in source code. However, many of these techniques lack the accuracy needed for industrial practice. The contribution of this work is the novel use of an Abstract Syntax Tree Neural Network (ASTNN) to identify and classify software vulnerabilities in the Common Weakness Enumeration (CWE) types. We make two fundamental claims in this work. First, the use of an ASTNN performs better than prior machine learning neural network architectures. Second, the benchmark data set commonly used for machine learning vulnerability classification is flawed for this use. To illustrate these claims, we describe our ASTNN architecture and evaluate it with more than 44,000 test cases across 29 CWEs in the NIST Juliet Test Suite data set. Results show a minimum of 88% accuracy across all CWEs.

Full Text