Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme

Jiankun Sun,Xiong Luo,Honghao Gao,Yang Gao,Weiping Wang,Xi Yang

doi:10.1186/s13677-020-00200-y

Abstract

As edge computing paradigm achieves great popularity in recent years, there remain some technical challenges that must be addressed to guarantee smart device security in Internet of Things (IoT) environment. Generally, smart devices transmit individual data across the IoT for various purposes nowadays, and it will cause losses and impose a huge threat to users since malware may steal and damage these data. To improve malware detection performance on IoT smart devices, we conduct a malware categorization analysis based on the Kaggle competition of Microsoft Malware Classification Challenge (BIG 2015) dataset in this article. Practically speaking, motivated by temporal convolutional network (TCN) structure, we propose a malware categorization scheme mainly using Word2Vec pre-trained model. Considering that the popular one-hot encoding converts input names from malicious files to high-dimensional vectors since each name is represented as one dimension in one-hot vector space, more compact vectors with fewer dimensions are obtained through the use of Word2Vec pre-training strategy, and then it can lead to fewer parameters and stronger malware feature representation. Moreover, compared with long short-term memory (LSTM), TCN demonstrates better performance with longer effective memory and faster training speed in sequence modeling tasks. The experimental comparisons on this malware dataset reveal better categorization performance with less memory usage and training time. Especially, through the performance comparison between our scheme and the state-of-the-art Word2Vec-based LSTM approach, our scheme shows approximately 1.3% higher predicted accuracy than the latter on this malware categorization task. Additionally, it also demonstrates that our scheme reduces about 90 thousand parameters and more than 1 hour on the model training time in this comparison.

Highlights

Recent developments in the field of edge computing have led to extensive attention on smart device security in the Internet of Things (IoT) environment [1]
Current malware identification for edge devices mainly relies on the malware signature databases from software distributors, yet this approach can not meet the demand of detecting the ongoing number of malware in edge computing paradigm
The results show that the weighted F-measure and the accuracy of our scheme are approximately 1.2% and 1.3% higher than those of the Word2Vec-based long short-term memory (LSTM), and the weighted false positive rate (FPR) of our scheme is approximately 0.3% lower

Summary

Introduction

Recent developments in the field of edge computing have led to extensive attention on smart device security in the Internet of Things (IoT) environment [1]. Malware detection and analysis have received extensive discussion, yet traditional approaches are not fully available on edge devices in the IoT environment. As smart devices put more emphasis on real-time interaction, the corresponding malware identification requires faster response speed than on traditional platforms. In addition to detection performance, memory footprint and response speed are of enormous importance for current smart devices on IoT, and this poses higher requirements for edge malware analysis. Note that word vectors are generally generated from the weights of trained language models rather than the direct training targets in Word2Vec. Generally, Word2Vec includes two kinds of architectures, i.e., contextual bag-of-words (CBOW) and skip-gram (SG), to learn distributed representation [12,13,14]. A simple skip-gram model architecture is shown in Fig. 1 [10]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cloud Computing Advances Systems and Applications	Publication Date: Sep 23, 2020
Citations: 16	License type: open-access

R Discovery Prime

R Discovery Prime

Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cloud Computing Advances Systems and Applications

Lead the way for us

Similar Papers

Investigating Users’ Privacy Concerns of Internet of Things (IoT) Smart Devices
Daniel Joy ... Olivera Kotevska
-
Daniel Joy, et. al.Daniel Joy ... Olivera Kotevska
28 Oct 2022
28 Oct 2022

Physical Unclonable Function Based Authentication Scheme for Smart Devices in Internet of Things
Muhammad Arif Muhal ... Zahid Mahmood
-
Muhammad Arif Muhal, et. al.Muhammad Arif Muhal ... Zahid Mahmood
01 Aug 2018
01 Aug 2018

Modified genetic algorithm and fine-tuned long short-term memory network for intrusion detection in the internet of things networks with edge capabilities
Yakub Kayode Saheed ... Taha Ait Tchakoucht
Applied Soft Computing Journal | VOL. 155
Yakub Kayode Saheed, et. al.Yakub Kayode Saheed ... Taha Ait Tchakoucht
28 Feb 2024
Applied Soft Computing Journal | VOL. 155

Federated learning‐based private medical knowledge graph for epidemic surveillance in internet of things
Xiaotong Wu ... Muhammad Bilal
Expert Systems | VOL. -
Xiaotong Wu, et. al.Xiaotong Wu ... Muhammad Bilal
11 Jun 2023
Expert Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cloud Computing Advances Systems and Applications