Detecting Web-Based Attacks with SHAP and Tree Ensemble Machine Learning Methods

Samuel Ndichu,Sangwook Kim,Tao Ban,Daisuke Inoue,Seiichi Ozawa,Takeshi Takahashi

doi:10.3390/app12010060

Samuel Ndichu, Sangwook Kim + Show 4 more

Open Access

https://doi.org/10.3390/app12010060

Copy DOI

Abstract

Attacks using Uniform Resource Locators (URLs) and their JavaScript (JS) code content to perpetrate malicious activities on the Internet are rampant and continuously evolving. Methods such as blocklisting, client honeypots, domain reputation inspection, and heuristic and signature-based systems are used to detect these malicious activities. Recently, machine learning approaches have been proposed; however, challenges still exist. First, blocklist systems are easily evaded by new URLs and JS code content, obfuscation, fast-flux, cloaking, and URL shortening. Second, heuristic and signature-based systems do not generalize well to zero-day attacks. Third, the Domain Name System allows cybercriminals to easily migrate their malicious servers to hide their Internet protocol addresses behind domain names. Finally, crafting fully representative features is challenging, even for domain experts. This study proposes a feature selection and classification approach for malicious JS code content using Shapley additive explanations and tree ensemble methods. The JS code features are obtained from the Abstract Syntax Tree form of the JS code, sample JS attack codes, and association rule mining. The malicious and benign JS code datasets obtained from Hynek Petrak and the Majestic Million Service were used for performance evaluation. We compared the performance of the proposed method to those of other feature selection methods in the task of malicious JS code content detection. With a recall of 0.9989, our experimental results show that the proposed approach is a better prediction model.

Highlights

Websites are very popular; cybercriminals find these platforms to be perfect tools for launching their attacks
This study proposes AST-JS feature selection using Shapley additive explanations (SHAP) values and tree ensemble methods to detect these attacks
We investigated how often AST-JS nodes appear together in benign and malicious JS codes using association rule mining

Summary

Introduction

Websites are very popular; cybercriminals find these platforms to be perfect tools for launching their attacks. Attackers compromise Uniform Resource Locators (URLs) and their JavaScript (JS) content to perform malicious activities on the Internet. Such activities include phishing, URL redirection, spamming, social engineering, botnets, and drive-by-download exploits [1,2,3]. The attacks are delivered through emails, malware advertisements, texts, pop-ups, malicious scripts, and search results. Securing websites is vital for maintaining confidentiality, integrity, and availability, and an adaptive strategy is required to detect such attacks effectively

Methods

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 22, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Detecting Web-Based Attacks with SHAP and Tree Ensemble Machine Learning Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Deobfuscation, unpacking, and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement
Samuel Ndichu ... Sangwook Kim
CAAI Transactions on Intelligence Technology | VOL. 5
Samuel Ndichu, et. al.Samuel Ndichu ... Sangwook Kim
17 Jul 2020
CAAI Transactions on Intelligence Technology | VOL. 5

Detection of Malicious JavaScript Code in Web Pages
Dharmaraj R Patil ... J B Patil
Indian Journal of Science and Technology | VOL. 10
Dharmaraj R Patil, et. al.Dharmaraj R Patil ... J B Patil
19 May 2017
Indian Journal of Science and Technology | VOL. 10

A Machine Learning Approach to Malicious JavaScript Detection using Fixed Length Vector Representation
Samuel Ndichu ... Seiichi Ozawa
-
Samuel Ndichu, et. al.Samuel Ndichu ... Seiichi Ozawa
01 Jul 2018
01 Jul 2018

Detection and analysis of drive-by-download attacks and malicious JavaScript code
Marco Cova ... Giovanni Vigna
-
Marco Cova, et. al.Marco Cova ... Giovanni Vigna
26 Apr 2010
26 Apr 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting Web-Based Attacks with SHAP and Tree Ensemble Machine Learning Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences