DDIML: Explainable detection model for drive-by-download attacks

Xiaole Liu,Cheng Huang,Yong Fang

doi:10.3233/jifs-212496

Abstract

A drive-by download is a method of hackers planting the Web Trojan, which exploits browser vulnerabilities to execute malicious software. Because people usually access web pages with various browsers daily, drive-by downloads have become one of the most common threats in recent years. Most previous studies utilize the abstract syntax tree(AST) with deep learning methods to detect such attacks, which achieved high accuracy but are time-consuming and challenging to explain. Also, some methods use dynamic analysis, which needs a specific environment and is time-consuming with the complex operation. In order to solve these problems, the paper proposes DDIML, an explainable machine learning model based on novel features with static analysis. These features are extracted from five aspects: code obfuscation, URL redirection, special behaviors, encoding characters, and CSS attributes. The most popular machine learning algorithm, Random forest, is applied for building the classifier detection model. In addition, we use both local and global explanations to improve the model and prove that the proposed model could be trusted. The Experimental results show that our proposed model can efficiently detect drive-by downloads with a detection precision of 0.983 and a recall of 0.980. The average detection time for each sample is only 16.07ms in total.

Full Text