Multifamily Classification of Android Malware With a Fuzzy Strategy to Resist Polymorphic Familial Variants

Xiaojian Liu,Xi Du,Qian Lei,Kehong Liu

doi:10.1109/access.2020.3019282

Xiaojian Liu, Xi Du + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.3019282

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 8	License type: CC BY 4.0

Affiliation: Xi'an University of Science and Technology

Abstract

The Multifamily classification of Android malware aims to identify a malicious sample as one of the given malware families. This problem is believed to be much more significant than the binary classification (simply identify a sample as malicious or benign) because it is able to reveal the behaviour patterns of multiple malware families and bring deep insights into the working mechanism of malicious payload. The main challenges of the multifamily classification involve two aspects: recognizing the behaviour patterns of malware families as well as addressing the issues of code obfuscation and polymorphic variants that are commonly used by adversaries to evade rigorous detections. To address these challenges, in this article, we utilize the regular expressions of callbacks to describe the behaviour patterns of malware families, and propose a two-step fuzzy processing strategy to resist potential polymorphic familial variants. The alphabet of such regular expressions only consists of security-sensitive API calls, this enables the regular expressions to resist various kinds of code obfuscation and metamorphism. The proposed fuzzy strategy, applied to the regular expressions, comprises two steps: the first step transforms an original regular expression to such a fuzzy regular expression that possesses a broader meaning than the original one; the second step further relaxes precise plaintext match between two regular expressions to a fuzzy match by introducing the notion of similarity of regular expressions. Applying this strategy promotes the abstract level of a regular expression and enables the behaviour pattern specified by the regular expression to be more resilient to code obfuscation and polymorphic variants. Furthermore, selecting the fuzzy regular expressions as features, we use text mining techniques to train a multifamily 1-NN classifier over 3270 samples of 65 families. The experimental results show that our approach outperforms most of the state-of-the-art approaches and tools, confirming the effectiveness of our approach.

Highlights

The Android system, as one of the most popular mobile platform, still faces serious security challenges due to its open-source nature, imperfect design of the permission system, and the absence of a full certification to application publications
Malware or malicious payload exploits the vulnerabilities within the Android system to implement a variety of attacks, such as privilege escalation, remote control, financial charges and personal information stealing, severely
APPROACH To address the above challenges, in this article, we propose an approach to the multifamily classification of malware by using text mining and machine learning techniques

Summary

INTRODUCTION

The Android system, as one of the most popular mobile platform, still faces serious security challenges due to its open-source nature, imperfect design of the permission system, and the absence of a full certification to application publications. THE TWO-STEP FUZZY PROCESSING STRATEGY FOR THE REGULAR EXPRESSIONS A malware family usually derives numerous polymorphic variants, namely similar but not exactly the same behaviour patterns. If we would make use of the precise match to compare two regular expressions and calculate cFBeh(Fi) and dFBeh(Fi), it might be too stringent to find out a meaningful behaviour pattern for a family To tackle this problem, we propose a two-step fuzzy processing strategy to deal with the possible variants of a family. For convenience, in what follows, we still use CB to denote the fuzzy version of the callback behaviour; all the regular expressions mentioned thereafter, if no confusion, will refer to the fuzzy ones that have been processed through the substitution operation.

THE SIMILARITY OF REGULAR EXPRESSIONS

TRAINING DATASETS

Findings

VIII. CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multifamily Classification of Android Malware With a Fuzzy Strategy to Resist Polymorphic Familial Variants

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Construction of fuzzy automata from fuzzy regular expressions
Aleksandar Stamenković ... Miroslav Ćirić
Fuzzy Sets and Systems | VOL. 199
Aleksandar Stamenković, et. al.Aleksandar Stamenković ... Miroslav Ćirić
03 Feb 2012
Fuzzy Sets and Systems | VOL. 199

Mining malware secrets
Aran Lakhotia ... Paul Black
-
Aran Lakhotia, et. al.Aran Lakhotia ... Paul Black
01 Oct 2017
01 Oct 2017

Polymorphic Malware Detection Using Hierarchical Hidden Markov Model
Fahad Bin Muhaya ... Yang Xiang
-
Fahad Bin Muhaya, et. al.Fahad Bin Muhaya ... Yang Xiang
01 Dec 2011
01 Dec 2011

DAEMON: Dataset/Platform-Agnostic Explainable Malware Classification Using Multi-Stage Feature Mining
Ron Korine ... Danny Hendler
IEEE Access | VOL. 9
Ron Korine, et. al.Ron Korine ... Danny Hendler
01 Jan 2020
IEEE Access | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multifamily Classification of Android Malware With a Fuzzy Strategy to Resist Polymorphic Familial Variants

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access