Abstract

The threat from the rampant Android malware has reached an alarming scale, where there are millions of new malware samples pouring into the application markets every year. In this paper, we present a new method that can efficiently detect the malware and attribute it to the corresponding malware family with a high accuracy. A multi-level fingerprint is firstly extracted from the application by using n-gram analysis and feature hashing. Each of its sub-fingerprints is then input to a dedicated online classifier. Based on the confidence scores from the classifiers and our devised combination function, the final decision will be made on whether the application is benign or malicious or in the scenario of family attribution, which malware family it belongs to. To the best of our knowledge, this is the first method developed based on the combination of n-gram analysis and online classifiers. The incremental learning enabled by the online classifiers facilitates our method to scale well even for a huge number of applications and adapt easily to different characteristics in new applications. The parallelized design not only magnifies the impact of distinguishing features in each sub-fingerprint but also allows our method to be extended, where additional application features can be added as extra sub-fingerprints. Extensive experiments were performed. The results show that our method achieved malware detection accuracy of 99.2% on a benchmark dataset with more than 10,000 samples and 86.2% on a dataset with more than 70,000 in-the-wild samples. Regarding malware family attribution, our method achieved an accuracy of 98.8% on the top 23 malware families of Drebin dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call