Deep Forest and Pruned Syntax Tree-Based Classification Method for Java Code Vulnerability

Jiaman Ding,Weikang Fu,Lianyin Jia

doi:10.3390/math11020461

Jiaman Ding, Weikang Fu + Show 1 more

Open Access

https://doi.org/10.3390/math11020461

Copy DOI

Journal: Mathematics	Publication Date: Jan 15, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Kunming University of Science and Technology

Abstract

The rapid development of J2EE (Java 2 Platform Enterprise Edition) has brought unprecedented severe challenges to vulnerability mining. The current abstract syntax tree-based source code vulnerability classification method does not eliminate irrelevant nodes when processing the abstract syntax tree, resulting in a long training time and overfitting problems. Another problem is that different code structures will be translated to the same sequence of tree nodes when processing abstract syntax trees using depth-first traversal, so in this process, the depth-first algorithm will lead to the loss of semantic structure information which will reduce the accuracy of the model. Aiming at these two problems, we propose a deep forest and pruned syntax tree-based classification method (PSTDF) for Java code vulnerability. First, the breadth-first traversal of the abstract syntax tree obtains the sequence of statement trees, next, pruning statement trees removes irrelevant nodes, then we use a depth-first based encoder to obtain the vector, and finally, we use deep forest as the classifier to get classification results. Experiments on publicly accessible vulnerability datasets show that PSTDF can reduce the loss of semantic structure information and effectively remove the impact of redundant information.

Full Text