Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Mathew Ashik,P Vinod,A Jyothish,Fabio Martinelli,Francesco Mercaldo,S Anandaram,Antonella Santone

doi:10.3390/electronics10141694

Abstract

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Highlights

Malware or malicious code is harmful code injected into legitimate programs to perpetrate illicit intentions
Dataset-1 (VX-Dataset): A total of 2000 Portable Executables were collected which consists of 1000 malware samples gathered from sources VxHeaven (650) [35], User Agency (250), and Offensive Computing (100), and benign samples were collected from Windows XP System32 Folder (450), Windows7 System32 Folder (100), MikTex/Matlab Library (400), and Games (50);
We address the detection of malicious files using diverse datasets comprising of real and synthetic malware samples

Summary

Introduction

Malware or malicious code is harmful code injected into legitimate programs to perpetrate illicit intentions. With the rapid growth of the Internet and heterogeneous devices connected over the network, the attack landscape has increased and has become a concern, affecting the privacy of users [1]. The primary source of infection, causing malicious programs to enter the systems without users’ knowledge. Freely downloadable software’s are a primary source of malware, which include freeware comprising of games, web browsers, free antivirus, etc. Financial transactions are performed using the Internet, these have caused huge financial losses for organizations and individuals. Malware writing has transformed into profit-making industries, attracting a large number of hackers. Current malware is broadly classified as polymorphic or metamorphic, and they remain undetected by a signature-based detector [2]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Jul 15, 2021
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Deep reinforcement learning for stock portfolio optimization by connecting with modern portfolio theory
Junkyu Jang ... Nohyoon Seong
Expert Systems with Applications | VOL. 218
Junkyu Jang, et. al.Junkyu Jang ... Nohyoon Seong
13 Jan 2023
Expert Systems with Applications | VOL. 218

COMPARISON BETWEEN BML VOLUMES FROM A SEMI-AUTOMATED SOFTWARE METHOD AND BML VOLUMES FROM A DEEP LEARNING ALGORITHM: DATA FROM THE OSTEOARTHRITIS INITIATIVE (OAI)
J Duryea ... C.K Kwoh
Osteoarthritis Imaging | VOL. 2
J Duryea, et. al.J Duryea ... C.K Kwoh
01 Jan 2021
Osteoarthritis Imaging | VOL. 2

Detection of Malware with Deep Learning Method
Umit Kose ... Refik Samet
-
Umit Kose, et. al.Umit Kose ... Refik Samet
15 Sep 2021
15 Sep 2021

Harnessing deep reinforcement learning algorithms for image categorization: A multi algorithm approach
Dhanvanth Reddy Yerramreddy ... Don S
Engineering Applications of Artificial Intelligence | VOL. 136
Dhanvanth Reddy Yerramreddy, et. al.Dhanvanth Reddy Yerramreddy ... Don S
17 Jul 2024
Engineering Applications of Artificial Intelligence | VOL. 136

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics