Malware Classification Using Probability Scoring and Machine Learning

Di Xue,Jingmei Li,Tu Lv,Weifei Wu,Jiaxiang Wang

doi:10.1109/access.2019.2927552

Abstract

Malware classification plays an important role in tracing the attack sources of computer security. However, existing static analysis methods are fast in classification, but they are inefficient in some malware using packing and obfuscation techniques; the dynamic analysis methods have better universality for packing and obfuscation, but they will cause excessive classification cost. To overcome these shortcomings, in this paper, we propose a classification system Malscore based on the probability scoring and machine learning, which sets the probability threshold to concatenate static analysis (called Phase 1) and dynamic analysis (called Phase 2). The convolutional neural networks with spatial pyramid pooling were used to analyze the grayscale images (static features) in Phase 1, and the variable n-grams and machine learning were used to analyze the native API call sequences (dynamic features) in Phase 2. Malscore combined static analysis with dynamic analysis not only accelerated the static analysis process by taking advantage of the CNN in image recognition but also appeared to be more resilient to obfuscation by the dynamic analysis. Different from other static and dynamic analysis techniques, when malware is detected, due to the fact that malware will most likely be labeled only by static analysis, we could reduce the overheads by dynamically analyzing a few malware that has less obvious features or greater confusion in static analysis. We performed experiments on 174607 malware samples from 63 malware families. The result showed that Malscore achieved 98.82% accuracy for malware classification. Furthermore, Malscore was compared with the method of using static and dynamic analysis. The preprocessing and test time represented a reduction of 59.58% and 61.70%, respectively.

Highlights

The emergence of various automated tools has shown that the speed with which malware mutates on the Internet is far faster than people realized
Most malware can be classified by analyzing static features, but the proliferation of the packing and obfuscation techniques facilitates the creation of malware with consistent behavior and inconsistent static features
We propose a malware classification system Malscore based on probability scoring and machine learning

Summary

INTRODUCTION

The emergence of various automated tools has shown that the speed with which malware mutates on the Internet is far faster than people realized. We use probability scoring to filter out most malware that get reliable classification results in classifier S, and only input unreliable malware into classifier D Through this method, the execution times of dynamic analysis is reduced, and the detection cost of Malscore is reduced. The CNN with SPP layer is used to analyze grayscale image (static feature) in Phase 1, and the variable n-grams and machine learning are used to analyze native API call sequence (dynamic features) in Phase 2. V. ANALYSIS OF NATIVE API CALL SEQUENCES USING VARIABLE N-GRAMS AND MACHINE LEARNING For the grayscale images generated in Section IV-A, there may be some samples of the same family whose static features are not very obvious. 3: Traversal APISequence, APIConcall ← one native API call or native API call subsequences that are called 4 times or more continuously

11: Delete repetitive native API calls in APIConcall in the family

EXPERIMENTS AND RESULTS

EVALUATION OF N-GRAMS AND MACHINE LEARNING

LIMITATIONS

VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 43	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Malware Classification Using Probability Scoring and Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection
Mojtaba Eskandari ... Sattar Hashemi
Journal of Computer Virology and Hacking Techniques | VOL. 9
Mojtaba Eskandari, et. al.Mojtaba Eskandari ... Sattar Hashemi
17 Feb 2013
Journal of Computer Virology and Hacking Techniques | VOL. 9

Dynamic Malware Analysis in the Modern Era—A State of the Art Survey
Ori Or-Meir ... Lior Rokach
ACM Computing Surveys | VOL. 52
Ori Or-Meir, et. al.Ori Or-Meir ... Lior Rokach
13 Sep 2019
ACM Computing Surveys | VOL. 52

MALWARE IMAGE PREDICTION AND CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK
P Ramya ... Madhankumar N
ShodhKosh: Journal of Visual and Performing Arts | VOL. 5
P Ramya, et. al.P Ramya ... Madhankumar N
30 Jun 2024
ShodhKosh: Journal of Visual and Performing Arts | VOL. 5

Runtime Analysis and Instrumentation for Securing Software
R Sekar
-
R SekarR Sekar
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Malware Classification Using Probability Scoring and Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access