Challenges in Classifying Privacy Policies by Machine Learning with Word-based Features

Keishiro Fukushima,Toru Nakamura,Daisuke Ikeda,Shinsaku Kiyomoto

doi:10.1145/3199478.3199486

Abstract

In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier.We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Challenges in Classifying Privacy Policies by Machine Learning with Word-based Features

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Machine Learning Algorithms for Privacy Policy Classification: A Comparative Study
Abdullah R Alshamsan ... Shafique A Chaudhry
-
Abdullah R Alshamsan, et. al.Abdullah R Alshamsan ... Shafique A Chaudhry
10 Jun 2022
10 Jun 2022

ATLAS: Automatically Detecting Discrepancies Between Privacy Policies and Privacy Labels
...
arXiv (Cornell University) | VOL. -
, et. al. ...
24 May 2023
arXiv (Cornell University) | VOL. -

Machine and deep learning algorithms for classifying different types of dementia: A literature review
Masoud Noroozi ... Niloofar Deravi
Applied Neuropsychology: Adult | VOL. ahead-of-print
Masoud Noroozi, et. al.Masoud Noroozi ... Niloofar Deravi
31 Jul 2024
Applied Neuropsychology: Adult | VOL. ahead-of-print

Honesty is the Best Policy: On the Accuracy of Apple Privacy Labels Compared to Apps' Privacy Policies
Mir Masood Ali ... Monica Kodwani
Proceedings on Privacy Enhancing Technologies | VOL. 2024
Mir Masood Ali, et. al.Mir Masood Ali ... Monica Kodwani
01 Oct 2024
Proceedings on Privacy Enhancing Technologies | VOL. 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Challenges in Classifying Privacy Policies by Machine Learning with Word-based Features

Abstract

Talk to us

Similar Papers