Detecting malicious JavaScript code based on semantic analysis

Yong Fang,Cheng Huang,Yu Su,Yaoyao Qiu

doi:10.1016/j.cose.2020.101764

Abstract

Web development technology has undergone tremendous evolution, the creation of JavaScript has greatly enriched the interactive capabilities of the client. However, attackers use the dynamics feature of JavaScript language to embed malicious code into web pages for the purpose of drive-by-download, redirection, etc. The traditional method based on static feature detection is difficult to detect the malicious code after obfuscation, and the method based on dynamic analysis has low efficiency. To overcome these challenges, this paper proposes a static detection model based on semantic analysis. The model firstly generates an abstract syntax tree from JavaScript source codes, then automatically converts them to syntactic unit sequences. FastText algorithm is introduced to training word vectors. The syntactic unit sequences are represented as word vectors which will be input into Bi-LSTM network with attention mechanism. The detection model with Bi-LSTM network with attention mechanism is the key to detect malicious JavaScript. We experimented with the dataset using a five-fold cross-validation method. Experiments showed that the model can effectively detect obfuscated malicious JavaScript code and improve the detection speed, with a precision of 0.977 and recall of 0.974.

Full Text