GlareShell: Graph learning-based PHP webshell detection for web server of industrial internet

Pengbin Feng,Dawei Wei,Qiaoyang Li,Qin Wang,Youbing Hu,Ning Xi,Jianfeng Ma

doi:10.1016/j.comnet.2024.110406

Abstract

With the explosive growth of the Industrial Internet scale, cyberattacks targeting industrial control systems also increased. The management and operation of Industrial Internet are usually performed via web servers which retain a large attack surface. In the Industrial Internet, attackers usually exploit vulnerabilities to inject malicious codes for remotely executing commands, stealing confidential data, and invading web servers. Existing approaches capture statistical and contextual dependence information from Webshell using machine learning (ML) or deep learning (DL) algorithms. However, the semantic feature mining of program code within Webshell is not sufficient when entering new types of Webshell. In this paper, we propose a graph learning-based PHP Webshell detection framework, GlareShell, using the word embedding technique, a risk weight allocation mechanism, and the graph neural network (GNN). First, GlareShell leverages static analysis to extract interprocedural control flow graphs (ICFGs) from PHP script files and then prunes these ICFGs to remove noisy statements. Then, word embedding techniques are employed to generate semantic representations from PHP statements. Next, we design a risk weight allocation mechanism to derive the risk levels of statements and concatenate them with word embeddings as attributions. The identified risk levels could guide the passing of potential attack patterns inside GNN models. Finally, GlareShell builds a GNN classifier directly from the ICFG with corresponding node attributions to identify the malicious PHP scripts. Experiment results on collected datasets prove the promise of our graph learning framework in the Webshell detection domain.

Full Text