A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors

Samuel Ndichu,Sangwook Kim,Seiichi Ozawa,Takeshi Misu,Kazuo Makishima

doi:10.1016/j.asoc.2019.105721

Abstract

Websites attract millions of visitors due to the convenience of services they offer, which provide for interesting targets for cyber attackers. Most of these websites use JavaScript (JS) to create dynamic content. The exploitation of vulnerabilities in servers, plugins, and other third-party systems enables the insertion of malicious codes into websites. These exploits use methods such as drive-by-downloads, pop up ads, and phishing attacks on news, porn, piracy, torrent or free software websites, among others. Many of the recent cyber-attacks exploit JS vulnerabilities, in some cases employing obfuscation to hide their maliciousness and evade detection. It is, therefore, primal to develop an accurate detection system for malicious JS to protect users from such attacks. This study adopts Abstract Syntax Tree (AST) for code structure representation and a machine learning approach to conduct feature learning called Doc2vec to address this issue. Doc2vec is a neural network model that can learn context information of texts with variable length. This model is a well-suited feature learning method for JS codes, which consist of text content ranging among single line sentences, paragraphs, and full-length documents. Besides, features learned with Doc2Vec are of low dimensions which ensure faster detections. A classifier model judges the maliciousness of a JS code using the learned features. The performance of this approach is evaluated using the D3M dataset (Drive-by-Download Data by Marionette) for malicious JS codes and the JSUNPACK plus Alexa top 100 websites datasets for benign JS codes. We then compare the performance of Doc2Vec on plain JS codes (Plain-JS) and AST form of JS codes (AST-JS) to other feature learning methods. Our experimental results show that the proposed AST features and Doc2Vec for feature learning provide better accuracy and fast classification in malicious JS codes detection compared to conventional approaches and can flag malicious JS codes previously identified as hard-to-detect.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Soft Computing	Publication Date: Aug 22, 2019
Citations: 58	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing

Lead the way for us

Similar Papers

A Machine Learning Approach to Malicious JavaScript Detection using Fixed Length Vector Representation
Samuel Ndichu ... Seiichi Ozawa
-
Samuel Ndichu, et. al.Samuel Ndichu ... Seiichi Ozawa
01 Jul 2018
01 Jul 2018

Deobfuscation, unpacking, and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement
Samuel Ndichu ... Sangwook Kim
CAAI Transactions on Intelligence Technology | VOL. 5
Samuel Ndichu, et. al.Samuel Ndichu ... Sangwook Kim
17 Jul 2020
CAAI Transactions on Intelligence Technology | VOL. 5

Detecting Web-Based Attacks with SHAP and Tree Ensemble Machine Learning Methods
Samuel Ndichu ... Takeshi Takahashi
Applied Sciences | VOL. 12
Samuel Ndichu, et. al.Samuel Ndichu ... Takeshi Takahashi
22 Dec 2021
Applied Sciences | VOL. 12

Detection of Malicious JavaScript Code in Web Pages
Dharmaraj R Patil ... J B Patil
Indian Journal of Science and Technology | VOL. 10
Dharmaraj R Patil, et. al.Dharmaraj R Patil ... J B Patil
19 May 2017
Indian Journal of Science and Technology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors

Abstract

Talk to us

Similar Papers

More From: Applied Soft Computing