XLMR4MD: New Vietnamese dataset and framework for detecting the consistency of description and permission in Android applications using large language models

Qui Ngoc Nguyen,Nguyen Tan Cam,Kiet Van Nguyen

doi:10.1016/j.cose.2024.103814

Abstract

Google Play and other application marketplaces have various Android applications and metadata. Among these, description information and privacy policy help explain the application's functionality. They also describe the permission of the application, especially those related to sensitive information. Detecting the inconsistency between the description of the application and privacy information and the permission extracted in the application's source code helps users decide whether to install and use the application. In this research, we propose a new method based on a pre-trained language model to detect inconsistencies between the permission extracted from the description application and privacy policy and the permission extracted from the application's source code (file APK). Related works focus on models of large-scale datasets, especially for resource-rich languages such as English. However, a language with low resources, like Vietnamese, needs more datasets for the task. To solve this problem, we propose the ViDPApp dataset (Description and Privacy Policy of Applications on Vietnamese domains), a high-quality dataset that humans manually annotate with 12,000+ sentences with an inter-annotator agreement (IAA) of over 85%. In addition, we proposed XLMR4MD, a new framework using large language models, outperforming powerful machine models (LSTM, Bi-GRU-LSTM-CNN, WikiBERT, DistilBERT, mBERT, and PhoBERT) and achieving the best with 84.04% F1 score in detecting inconsistencies between Android application permission and description. This framework can be fine-tuned for 100 languages, which benefits low-resource languages like Vietnamese. The dataset is available for research purposes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

XLMR4MD: New Vietnamese dataset and framework for detecting the consistency of description and permission in Android applications using large language models

Abstract

Talk to us

Similar Papers

More From: Computers & Security

Lead the way for us

Similar Papers

Towards a TRansparent I/O Solution
Fotios Nikolaidis ... Soraya Zertal
-
Fotios Nikolaidis, et. al.Fotios Nikolaidis ... Soraya Zertal
26 Mar 2018
26 Mar 2018

Using clickstream data to enhance reverse engineering of Web applications
Marko Poženel ... Boštjan Slivnik
-
Marko Poženel, et. al.Marko Poženel ... Boštjan Slivnik
25 Oct 2019
25 Oct 2019

A New Approach to Web Application Security: Utilizing GPT Language Models for Source Code Inspection
Zoltán Szabó ... Vilmos Bilicki
Future Internet | VOL. 15
Zoltán Szabó, et. al.Zoltán Szabó ... Vilmos Bilicki
28 Sep 2023
Future Internet | VOL. 15

Automating Information Retrieval from Biodiversity Literature Using Large Language Models: A Case Study
Vamsi Krishna Kommineni ... Birgitta Koenig-Ries
Biodiversity Information Science and Standards | VOL. 8
Vamsi Krishna Kommineni, et. al.Vamsi Krishna Kommineni ... Birgitta Koenig-Ries
10 Sep 2024
Biodiversity Information Science and Standards | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

XLMR4MD: New Vietnamese dataset and framework for detecting the consistency of description and permission in Android applications using large language models

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Security

More From: Computers & Security