Abstract

Currently, personal data collection and processing are widely used while providing digital services within mobile sensing networks for their operation, personalization, and improvement. Personal data are any data that identifiably describe a person. Legislative and regulatory documents adopted in recent years define the key requirements for the processing of personal data. They are based on the principles of lawfulness, fairness, and transparency of personal data processing. Privacy policies are the only legitimate way to provide information on how the personal data of service and device users is collected, processed, and stored. Therefore, the problem of making privacy policies clear and transparent is extremely important as its solution would allow end users to comprehend the risks associated with personal data processing. Currently, a number of approaches for analyzing privacy policies written in natural language have been proposed. Most of them require a large training dataset of privacy policies. In the paper, we examine the existing corpora of privacy policies available for training, discuss their features and conclude on the need for a new dataset of privacy policies for devices and services of the Internet of Things as a part of mobile sensing networks. The authors develop a new technique for collecting and cleaning such privacy policies. The proposed technique differs from existing ones by the usage of e-commerce platforms as a starting point for document search and enables more targeted collection of the URLs to the IoT device manufacturers’ privacy policies. The software tool implementing this technique was used to collect a new corpus of documents in English containing 592 unique privacy policies. The collected corpus contains mainly privacy policies that are developed for the Internet of Things and reflect the latest legislative requirements. The paper also presents the results of the statistical and semantic analysis of the collected privacy policies. These results could be further used by the researchers when elaborating techniques for analysis of the privacy policies written in natural language targeted to enhance their transparency for the end user.

Highlights

  • Introduction iationsCurrently, IoT devices are actively used in critical industries and in everyday life, creating a comfortable and safe living environment

  • This paper presents a technique for the generation of a dataset of privacy policies for

  • Though the datasets [17,18] have annotations that describe different aspects of personal data usage, they consist of the privacy policies that were created before the adoption of the GDPR, and, these privacy policies do not consider the requirements of this regulatory document

Read more

Summary

Related Works and Their Comparative Analysis

After the adoption of the GDPR [1] the problem of the analysis of the privacy policies written in natural language has been actively researched. Though the datasets [17,18] have annotations that describe different aspects of personal data usage, they consist of the privacy policies that were created before the adoption of the GDPR, and, these privacy policies do not consider the requirements of this regulatory document. The authors implemented a series of experiments with this data set to determine the similarity between documents, conducted policy readability tests, extracted aspects of personal data usage scenarios using key phrases and words They analyzed the dataset using topic modeling methods. The authors propose an approach that incorporates analysis of the privacy policy written in natural language, generation and automatic processing of the ontology in order to calculate privacy risks associated with a given privacy policy.

The Technique for Privacy Policy Corpus Collection
Collecting Hyperlinks to IoT Devices and Generating a List of Manufacturers
Searching Websites of Smart Device Manufacturers
Searching Links to Privacy Policies
Privacy Policy Sanitization
Description of the Generated Corpus of Privacy Policies for IoT Devices
Statistical Characteristics of the Privacy Policy Corpus
Semantic Analysis of the Generated Corpus of Privacy Policies
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call