Abstract

Android app clone detection has been extensively studied in our community, and a number of effective approaches and frameworks were proposed and released. However, there still remains one open challenge that has not been well addressed in previous work, i.e., the authorship attribution for the detected app clones. Although state-of-the-art approaches could accurately identify repackaged apps in one way or another, no convincing method has been proposed to identify the original app and the authentic author from the repackaged app pairs, which greatly limits the usage scenario of app clone detection techniques. For example, app market maintainers have to manually confirm the identified repackaged app pairs, while in most cases, it is challenging for them to make an accurate decision. In this paper, we propose AppAuth, a novel learning-based approach to predict the authorship of app clones. To be specific, for a given Android app clone pair (or a group of repackaged apps identified), AppAuth could accurately infer the original author of the plagiarized apps. Our approach is motivated by the traditional authorship attribution studies on binary files. AppAuth first extracts a number of coding-style-related features from the executable .apk files, and then relies on machine learning techniques to train a classification model. We have conducted extensive experiments to evaluate the effectiveness of AppAuth. The experiment results suggest that we are able to infer the authorship for Android app clones with high precision. Our work is the first one that tackles the problem systematically and we believe our efforts could positively contribute to the research community and boost the research of app repacking detection and authorship attribution studies.

Highlights

  • Due to the open nature of Android ecosystem, Android apps are easy to be cracked and modified

  • During evaluation, we seek to answer the following research questions (RQs): RQ1 How effective is AppAuth in identifying the authorship for a given app? As our main goal in this paper is to identify the original author of the repackaged apps, it is important to evaluate the effectiveness of our approach

  • RQ3 Is the classification result affected by the limited existing knowledge of the known app developers? Our approach is based on the assumption that we have obtained a number of apps released by each developer, and we could explore the developing-related-features for each app and train the model to flag the most likely authentic author

Read more

Summary

Introduction

Due to the open nature of Android ecosystem, Android apps are easy to be cracked and modified. App repackaging (or app clone) has become a severe threat to the Android ecosystem [1], [2], where it is used by plagiarists who clone apps from other developers, e.g., in order to redirect advertisement revenue [3], [4], and insert malicious payloads on popular apps to distribute malware [5], etc. These actions cause the original authors lose potential revenues, but may introduce a number of. Even if when looking at the developer signature information, it is usually hard to make an accurate decision

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.