An Improved Method of Detecting Macro Malware on an Imbalanced Dataset

Mamoru Mimura

doi:10.1109/access.2020.3037330

Abstract

In spear-phishing attacks, macro malware written in VBA (Visual Basic for Applications) is often used to compromise the target computers. Macro malware is often obfuscated in several ways to evade detection. To detect new macro malware, several methods with machine learning techniques have been proposed. While many methods were evaluated with the inadequate or balanced dataset with the same number of benign and malicious samples, practical performance is still open to discussion. In reality, the population of VBA macros consists of wide variety of samples. To evaluate practical performance, an imbalanced dataset which contains many benign samples is required. In this paper, we propose an improved method of detecting macro malware on an imbalanced dataset. Our method uses 2 language models (Doc2vec and Latent Semantic Indexing (LSI)) and 4 popular classifiers. These language models are used to extract features and mitigate the class imbalance problem by selecting important features. We create an imbalanced dataset with more than 30,000 samples and evaluate the practical performance. The experimental result demonstrates that our method mitigates the class imbalance problem and could detect completely new malware regardless of the family type. The result also reveals that LSI is more robust than Doc2vec to the class imbalance problem.

Highlights

Spear-phishing attacks are one of main threats for organizations of all sizes and across every field
While many studies focus on Portable Document Format (PDF) document files [2]–[8] or their JavaScript [9]–[11], this study focuses on Microsoft (MS) document files
STRUCTURE This paper proposes an improved method of detecting macro malware on an imbalanced dataset

Summary

Introduction

Spear-phishing attacks are one of main threats for organizations of all sizes and across every field. To detect new macro malware, several methods with machine learning models have been proposed [14]–[19]. These methods are evaluated with a balanced dataset with the same number of benign and malicious samples. An imbalanced dataset which contains many benign samples is required [24]. The experimental result demonstrates that our method mitigates the class imbalance problem and could detect new malware families.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 43	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Improved Method of Detecting Macro Malware on an Imbalanced Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Towards Efficient Detection of Malicious VBA Macros with LSI
Mamoru Mimura ... Taro Ohminami
-
Mamoru Mimura, et. al.Mamoru Mimura ... Taro Ohminami
01 Jan 2019
01 Jan 2019

Impact of benign sample size on binary classification accuracy
Mamoru Mimura
Expert Systems With Applications | VOL. 211
Mamoru MimuraMamoru Mimura
27 Aug 2022
Expert Systems With Applications | VOL. 211

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

Noise-adaptive synthetic oversampling technique
Minh Thanh Vo ... Tuong Le
Applied Intelligence | VOL. 51
Minh Thanh Vo, et. al.Minh Thanh Vo ... Tuong Le
18 Mar 2021
Applied Intelligence | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Improved Method of Detecting Macro Malware on an Imbalanced Dataset

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access