Abstract
Malware classification plays an important role in tracing the attack sources of computer security. However, existing static analysis methods are fast in classification, but they are inefficient in some malware using packing and obfuscation techniques; the dynamic analysis methods have better universality for packing and obfuscation, but they will cause excessive classification cost. To overcome these shortcomings, in this paper, we propose a classification system Malscore based on the probability scoring and machine learning, which sets the probability threshold to concatenate static analysis (called Phase 1) and dynamic analysis (called Phase 2). The convolutional neural networks with spatial pyramid pooling were used to analyze the grayscale images (static features) in Phase 1, and the variable n-grams and machine learning were used to analyze the native API call sequences (dynamic features) in Phase 2. Malscore combined static analysis with dynamic analysis not only accelerated the static analysis process by taking advantage of the CNN in image recognition but also appeared to be more resilient to obfuscation by the dynamic analysis. Different from other static and dynamic analysis techniques, when malware is detected, due to the fact that malware will most likely be labeled only by static analysis, we could reduce the overheads by dynamically analyzing a few malware that has less obvious features or greater confusion in static analysis. We performed experiments on 174607 malware samples from 63 malware families. The result showed that Malscore achieved 98.82% accuracy for malware classification. Furthermore, Malscore was compared with the method of using static and dynamic analysis. The preprocessing and test time represented a reduction of 59.58% and 61.70%, respectively.
Highlights
The emergence of various automated tools has shown that the speed with which malware mutates on the Internet is far faster than people realized
Most malware can be classified by analyzing static features, but the proliferation of the packing and obfuscation techniques facilitates the creation of malware with consistent behavior and inconsistent static features
We propose a malware classification system Malscore based on probability scoring and machine learning
Summary
The emergence of various automated tools has shown that the speed with which malware mutates on the Internet is far faster than people realized. We use probability scoring to filter out most malware that get reliable classification results in classifier S, and only input unreliable malware into classifier D Through this method, the execution times of dynamic analysis is reduced, and the detection cost of Malscore is reduced. The CNN with SPP layer is used to analyze grayscale image (static feature) in Phase 1, and the variable n-grams and machine learning are used to analyze native API call sequence (dynamic features) in Phase 2. V. ANALYSIS OF NATIVE API CALL SEQUENCES USING VARIABLE N-GRAMS AND MACHINE LEARNING For the grayscale images generated in Section IV-A, there may be some samples of the same family whose static features are not very obvious. 3: Traversal APISequence, APIConcall ← one native API call or native API call subsequences that are called 4 times or more continuously
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.