Abstract

An explosive spread of Android malware causes a serious concern for Android application security. One of the solutions to detecting malicious payloads sneaking in an application is to treat the detection as a binary classification problem, which can be effectively tackled with traditional machine learning techniques. The key factors in detecting Android malware with machine learning techniques are feature selection and generation. Most of the existing approaches select and generate features without fully examining the structures of programs, and thus the important semantic information associated with these features is lost, consequently resulting in a low accuracy rate in detection. To address this issue, we propose a new feature generation approach for Android applications, which takes components and program structures into consideration and extracts features in a graph-based and semantics-rich style. This approach highlights two major distinguishing aspects: the context-based feature selection and graph-based feature generation. We abstract an Android application as a collection of reduced iCFGs (interprocedural control flow graphs) and extract original features from these graphs. Combining the original features and their contexts together, we generate new features which hold richer semantic information than the original ones. By embedding the features into a feature vector space, we can use machine learning techniques to train a malware detector. The experiment results show that this approach achieves an accuracy rate of 95.4% and a recall rate of 96.5%, which prove the effectiveness and advantages of our approach.

Highlights

  • Android system, as one of the most popular mobile platforms, faces various serious security challenges due to its open-source characteristics, imperfect permission mechanisms, and the absence of full certification of applications at their publications

  • Machine learning techniques have been widely used in malware detection [1,2,3,4,5,6,7,8]. is kind of approach treats malware detection as a binary classification problem, which can be tackled with the traditional techniques in pattern recognition or machine learning disciplines

  • (4) we combine the raw features with their contexts to form the new features, which hold richer semantic information than individual raw features, and using this feature set to classify an application can achieve a better detection performance

Read more

Summary

Introduction

As one of the most popular mobile platforms, faces various serious security challenges due to its open-source characteristics, imperfect permission mechanisms, and the absence of full certification of applications at their publications. Is kind of approach treats malware detection as a binary classification problem (i.e., differentiate an application as malicious or benign), which can be tackled with the traditional techniques in pattern recognition or machine learning disciplines Since this approach does not fully investigate the semantics and all details of programs, it achieves a better performance over traditional dynamic approaches [1, 2, 9] and static approaches [4,5,6, 10,11,12,13,14,15] in terms of scalability and time consumption. (1) We propose a context-based feature selection approach, which combines the three kinds of raw features with their contexts to serve as newly generated features Since these features hold rich semantic information about program behaviors, we achieve a better result than the traditional machine learning-based approaches.

Feature Selection
Feature Generation
Feature Transformation
Feature Function for Bigrams
Related Works
90 Accuarcy
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.