Identifying Android Malware Families Using Android-Oriented Metrics

William Blanc,M J Hussain Almohri,Karim O Elish,Lina G Hashem

doi:10.1109/bigdata47090.2019.9005669

Abstract

Android malware (malicious apps) families share common attributes and behavior through sharing core malicious code. However, as the number of new malware increases, the task of identifying the correct family becomes more challenging. Two prominent approaches tackle this problem, either using dynamic analysis that captures the runtime behavior of the malware or using static analysis methods that can reveal malicious behavior by analyzing the underlying logic and code patterns. A third emerging way is to use the various sources of identification features to analyze the architectural and external attributes of a malicious app. For example, two malicious apps can have different behavioral patterns but share common attributes. We hypothesize that this malware can belong to the same family but attempt to mislead dynamic and code-level static analysis tools by randomizing their behavior. In this work, we utilize a promising set of Android-oriented code metrics that guide a supervised classification learning process for identifying malware families in Android. Our empirical results on 2,869 malware apps, across 35 different malware families, show that these metrics are very effective to identify malware families. In particular, we achieve low false positive rate (1.2%) and AUC score of 0.984 for family identification by using Random Forest (RF) classifier.

Full Text