Abstract
Google Play and other application marketplaces have various Android applications and metadata. Among these, description information and privacy policy help explain the application's functionality. They also describe the permission of the application, especially those related to sensitive information. Detecting the inconsistency between the description of the application and privacy information and the permission extracted in the application's source code helps users decide whether to install and use the application. In this research, we propose a new method based on a pre-trained language model to detect inconsistencies between the permission extracted from the description application and privacy policy and the permission extracted from the application's source code (file APK). Related works focus on models of large-scale datasets, especially for resource-rich languages such as English. However, a language with low resources, like Vietnamese, needs more datasets for the task. To solve this problem, we propose the ViDPApp dataset (Description and Privacy Policy of Applications on Vietnamese domains), a high-quality dataset that humans manually annotate with 12,000+ sentences with an inter-annotator agreement (IAA) of over 85%. In addition, we proposed XLMR4MD, a new framework using large language models, outperforming powerful machine models (LSTM, Bi-GRU-LSTM-CNN, WikiBERT, DistilBERT, mBERT, and PhoBERT) and achieving the best with 84.04% F1 score in detecting inconsistencies between Android application permission and description. This framework can be fine-tuned for 100 languages, which benefits low-resource languages like Vietnamese. The dataset is available for research purposes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.