Abstract
JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch various attacks, such as Cross-Site Scripting (XSS), Cross-site Request Forgery (CSRF), and drive-by download attack. Traditional malicious script detection relies on expert knowledge, but even for experts, this is an error-prone task. To solve this problem, many learning-based methods for malicious JavaScript detection are being explored. In this paper, we propose a novel deep learning-based method for malicious JavaScript detection. In order to extract semantic information from JavaScript programs, we construct the Program Dependency Graph (PDG) and generate semantic slices, which preserve rich semantic information and are easy to transform into vectors. Then, a malicious JavaScript detection model based on the Bidirectional Long Short-Term Memory (BLSTM) neural network is proposed. Experimental results show that, in comparison with the other five methods, our model achieved the best performance, with an accuracy of 97.71% and an F1-score of 98.29%.
Highlights
JavaScript is a lightweight scripting language that is often included in web pages to provide additional dynamic functionality [1]
We propose a novel abstract code representation for malicious JavaScript detection, which preserve rich semantic information and are easy to transform into vectors
That athe neural whether or notneural a slicenetwork of code which is malicious depends on the context, so we we consider need to find proper neural networks applied to natural language processing are potentially suitable for malicious JavaScript detection, as context is crucial in this domain
Summary
JavaScript is a lightweight scripting language that is often included in web pages to provide additional dynamic functionality [1]. Even for experts, determining whether a JavaScript file is malicious is an error-prone and time-consuming task because of the complexity of the problem To overcome these limitations, many new learning-based methods are being explored. The syntax of the program is more flexible, and the dependency relationship between statements is not determined by their distance, which means that traversing the abstract syntax tree or the token sequence cannot effectively capture the semantic information This means that abstract representations that are more sensitive to program characteristics make malicious JavaScript detection more accurate. A malicious code detection model based on the Bidirectional Long Short-Term Memory (BLSTM) neural network is proposed. We implemented the BLSTM neural network and compared it with four other machine learning-based detection models and a traditional antivirus software.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.