Abstract

Every day, developers add new applications (apps) to the Google Play Store, which ease users' lives and entertain them. The rapid development of these apps is only possible through the provision of software libraries whose functionality can be directly integrated into an app without creating it from scratch. For instance, some libraries provide extended possibilities for displaying content or additional support for specific networking capabilities. All these libraries are bundled with the main application code into one binary to avoid delays due to the loading of external functionality. The wide distribution of Android apps and the high turnover in this market attracts criminal actors (attackers). These attackers decompile the apps, integrate additional ad libraries or malware, and republish them. Through this approach, they use the apps’ popularity to trick users into downloading their repackaged apps. To prevent such malicious practices, developers and producers of libraries have begun to obfuscate their apps to make the decompilation process more challenging. However, the obfuscation of apps is not only done by developers but also by attackers to make the detection of copyright infringement harder or hide their malicious intent. While analysts try to protect the developers’ copyright and the privacy and security of app users, code obfuscation hinders them from identifying libraries and repackaged apps, detecting obfuscated names and strings, and recovering them. The obfuscated code might contain not only malware but also vulnerabilities or unauthorized access to private data. This dissertation introduces different approaches that support the analyses mentioned above using static analysis, dynamic analysis, and machine learning. Since the obfuscation of repackaged apps makes it difficult to distinguish between library and app code, we present approaches for library detection, separation of app code, and mapping of library code. We evaluated the effectiveness of these approaches under the influence of different obfuscation techniques. Furthermore, we present our approach for identifying repackaged apps that uses our library identification to measure the similarity between repackaged and original apps without the influence of library code, which would distort the measurement. Further contributions support the recovery of names from obfuscated entities in library code. Finally, we presented an approach for identifying and recovering obfuscated strings that supports data-flow analyses. Using our approaches, we outperformed all state-of-the-art competitors. Furthermore, we analyzed in total over 100,000 apps for obfuscated names, obfuscated libraries, obfuscated strings, and repackaged apps.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call