Abstract

The Multifamily classification of Android malware aims to identify a malicious sample as one of the given malware families. This problem is believed to be much more significant than the binary classification (simply identify a sample as malicious or benign) because it is able to reveal the behaviour patterns of multiple malware families and bring deep insights into the working mechanism of malicious payload. The main challenges of the multifamily classification involve two aspects: recognizing the behaviour patterns of malware families as well as addressing the issues of code obfuscation and polymorphic variants that are commonly used by adversaries to evade rigorous detections. To address these challenges, in this article, we utilize the regular expressions of callbacks to describe the behaviour patterns of malware families, and propose a two-step fuzzy processing strategy to resist potential polymorphic familial variants. The alphabet of such regular expressions only consists of security-sensitive API calls, this enables the regular expressions to resist various kinds of code obfuscation and metamorphism. The proposed fuzzy strategy, applied to the regular expressions, comprises two steps: the first step transforms an original regular expression to such a fuzzy regular expression that possesses a broader meaning than the original one; the second step further relaxes precise plaintext match between two regular expressions to a fuzzy match by introducing the notion of similarity of regular expressions. Applying this strategy promotes the abstract level of a regular expression and enables the behaviour pattern specified by the regular expression to be more resilient to code obfuscation and polymorphic variants. Furthermore, selecting the fuzzy regular expressions as features, we use text mining techniques to train a multifamily 1-NN classifier over 3270 samples of 65 families. The experimental results show that our approach outperforms most of the state-of-the-art approaches and tools, confirming the effectiveness of our approach.

Highlights

  • The Android system, as one of the most popular mobile platform, still faces serious security challenges due to its open-source nature, imperfect design of the permission system, and the absence of a full certification to application publications

  • Malware or malicious payload exploits the vulnerabilities within the Android system to implement a variety of attacks, such as privilege escalation, remote control, financial charges and personal information stealing, severely

  • APPROACH To address the above challenges, in this article, we propose an approach to the multifamily classification of malware by using text mining and machine learning techniques

Read more

Summary

INTRODUCTION

The Android system, as one of the most popular mobile platform, still faces serious security challenges due to its open-source nature, imperfect design of the permission system, and the absence of a full certification to application publications. THE TWO-STEP FUZZY PROCESSING STRATEGY FOR THE REGULAR EXPRESSIONS A malware family usually derives numerous polymorphic variants, namely similar but not exactly the same behaviour patterns. If we would make use of the precise match to compare two regular expressions and calculate cFBeh(Fi) and dFBeh(Fi), it might be too stringent to find out a meaningful behaviour pattern for a family To tackle this problem, we propose a two-step fuzzy processing strategy to deal with the possible variants of a family. For convenience, in what follows, we still use CB to denote the fuzzy version of the callback behaviour; all the regular expressions mentioned thereafter, if no confusion, will refer to the fuzzy ones that have been processed through the substitution operation.

THE SIMILARITY OF REGULAR EXPRESSIONS
TRAINING DATASETS
Findings
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.