Abstract

JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch various attacks, such as Cross-Site Scripting (XSS), Cross-site Request Forgery (CSRF), and drive-by download attack. Traditional malicious script detection relies on expert knowledge, but even for experts, this is an error-prone task. To solve this problem, many learning-based methods for malicious JavaScript detection are being explored. In this paper, we propose a novel deep learning-based method for malicious JavaScript detection. In order to extract semantic information from JavaScript programs, we construct the Program Dependency Graph (PDG) and generate semantic slices, which preserve rich semantic information and are easy to transform into vectors. Then, a malicious JavaScript detection model based on the Bidirectional Long Short-Term Memory (BLSTM) neural network is proposed. Experimental results show that, in comparison with the other five methods, our model achieved the best performance, with an accuracy of 97.71% and an F1-score of 98.29%.

Highlights

  • JavaScript is a lightweight scripting language that is often included in web pages to provide additional dynamic functionality [1]

  • We propose a novel abstract code representation for malicious JavaScript detection, which preserve rich semantic information and are easy to transform into vectors

  • That athe neural whether or notneural a slicenetwork of code which is malicious depends on the context, so we we consider need to find proper neural networks applied to natural language processing are potentially suitable for malicious JavaScript detection, as context is crucial in this domain

Read more

Summary

Introduction

JavaScript is a lightweight scripting language that is often included in web pages to provide additional dynamic functionality [1]. Even for experts, determining whether a JavaScript file is malicious is an error-prone and time-consuming task because of the complexity of the problem To overcome these limitations, many new learning-based methods are being explored. The syntax of the program is more flexible, and the dependency relationship between statements is not determined by their distance, which means that traversing the abstract syntax tree or the token sequence cannot effectively capture the semantic information This means that abstract representations that are more sensitive to program characteristics make malicious JavaScript detection more accurate. A malicious code detection model based on the Bidirectional Long Short-Term Memory (BLSTM) neural network is proposed. We implemented the BLSTM neural network and compared it with four other machine learning-based detection models and a traditional antivirus software.

Related Work
Dynamic Analysis of Malicious JavaScript
Static Analysis of Malicious JavaScript
Defining
Program Dependency Analysis
Program Slices Generation
Model Selection
Structure of BLSTM
The structure of a typical
Experiment and Result
Dataset Preprocessing
Measurement Metrics
Learning the BLSTM Neural Network
Detection Performance
Methods
Limitations
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call