GPT vs human legal texts annotations: A comparative study with privacy policies
GPT vs human legal texts annotations: A comparative study with privacy policies
- Preprint Article
- 10.21203/rs.3.rs-5799153/v1
- Jan 10, 2025
High-quality corpora of annotated privacy policies are scarce, yet essential for training, testing, and evaluating accurate machine learning models. However, elaborating new corpora remains an error-prone and resource-intensive task, heavily reliant on highly specialized and hard-to-find human annotators. Recent advancements in Generative Pre-trained Transformers (GPTs) open the possibility of using them to annotate privacy policies with performance comparable to that of human annotators, thereby streamlining the process while reducing human resource demands. This paper presents a novel method for annotating privacy policies based on a codebook, a well-designed prompt, and the analysis of logarithmic probabilities (logprobs) of a GPT's output tokens during the annotation process. We validated our method using the GPT-4o model and the well-known, open, multi-class, and multi-label OPP-115 corpus, achieving performance comparable to 80% of human annotators in segment-level annotation and matching 90% of human annotators in a full-text level annotation. Furthermore, incorporating logprobs analysis allowed the method to match the performance of all human annotators in full-text level annotation, suggesting that context enhances the task. These findings demonstrate the potential of our method to automate annotations with performance similar to human annotators while significantly reducing resource demands.
- Conference Article
34
- 10.1109/relaw.2015.7330207
- Aug 25, 2015
Privacy policies serve to inform consumers about a company's data practices, and to protect the company from legal risk due to undisclosed uses of consumer data. In addition, US and EU regulators require companies to accurately describe their practices in these policies, and some laws prescribe how companies should write these policies. Despite these aims, privacy policies are frequently criticized for being vague and uninformative. To support and improve the analysis of privacy policies, we report results from constructing an information type lexicon from manual, human annotations and an entity extractor based on part-of-speech tagging. The lexicon was constructed from 3,850 annotations obtained from crowd workers analyzing 15 privacy policies. An entity extractor was designed to extract entities from these annotations. The extractor succeeds at finding entities in 92% of annotations and the lexicon consists of 725 unique entities. Finally, we measured the terminological reuse across all 15 policies and observed the lexicon has a 31-78% chance of containing a word from any previously seen policy.
- Conference Article
- 10.13016/m2qt7r-unmb
- Sep 30, 2021
A growing number of web and cloud-based products and services rely on data sharing between consumers, service providers, and their subsidiaries and third parties. There is a growing concern around the security and privacy of data in such large-scale shared architectures. Most organizations have a human-written privacy policy that discloses all the ways that data is shared, stored, and used. The organizational privacy policies must also be compliant with government and administrative regulations. This raises a major challenge for providers as they try to launch new services. Thus they are moving towards a system of automatic policy maintenance and regulatory compliance. This requires extracting policy from text documents and representing it in a semi-structured, machine-processable framework. The most popular method to this end is extracting policy information into a Knowledge Graph (KG). There exists a significant body of work that converts text descriptions of regulations into policies expressed in languages such as OWL and XACML and is grounded in the control-based schema by using NLP approaches. In this paper, we show that the NLP-based approaches to extract knowledge from written policy documents and representing them in enforceable Knowledge Graphs fail when the text policies are ambiguous. Ambiguity can arise from lack of clarity, misuse of syntax, and/or the use of complex language. We describe a system to extract features from a policy document that affect its ambiguity and classify the documents based on the level of ambiguity present. We validate this approach using human annotators. We show that a large number of documents in a popular privacy policy corpus (OPP-115) are ambiguous. This affects the ability to automatically monitor privacy policies. We show that for policies that are more ambiguous according to our proposed measure, NLP-based text segment classifiers are less accurate.
- Research Article
47
- 10.1145/2907942
- May 24, 2016
- ACM Transactions on Software Engineering and Methodology
Privacy policies describe high-level goals for corporate data practices; regulators require industries to make available conspicuous, accurate privacy policies to their customers. Consequently, software requirements must conform to those privacy policies. To help stakeholders extract privacy goals from policies, we introduce a semiautomated framework that combines crowdworker annotations, natural language typed dependency parses, and a reusable lexicon to improve goal-extraction coverage, precision, and recall. The framework evaluation consists of a five-policy corpus governing web and mobile information systems, yielding an average precision of 0.73 and recall of 0.83. The results show that no single framework element alone is sufficient to extract goals; however, the overall framework compensates for elemental limitations. Human annotators are highly adaptive at discovering annotations in new texts, but those annotations can be inconsistent and incomplete; dependency parsers lack sophisticated, tacit knowledge, but they can perform exhaustive text search for prospective requirements indicators; and while the lexicon may never completely saturate, the lexicon terms can be reliably used to improve recall. Lexical reuse reduces false negatives by 41%, increasing the average recall to 0.85. Last, crowd workers were able to identify and remove false positives by around 80%, which improves average precision to 0.93.
- Conference Article
5
- 10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00201
- Sep 1, 2021
A growing number of web and cloud-based products and services rely on data sharing between consumers, service providers, and their subsidiaries and third parties. There is a growing concern around the security and privacy of data in such large-scale shared architectures. Most organizations have a human-written privacy policy that discloses all the ways that data is shared, stored, and used. The organizational privacy policies must also be compliant with government and administrative regulations. This raises a major challenge for providers as they try to launch new services. Thus they are moving towards a system of automatic policy maintenance and regulatory compliance. This requires extracting policy from text documents and representing it in a semi-structured, machine-processable framework. The most popular method to this end is extracting policy information into a Knowledge Graph (KG). There exists a significant body of work that converts text descriptions of regulations into policies expressed in languages such as OWL and XACML and is grounded in the control-based schema by using NLP approaches. In this paper, we show that the NLP-based approaches to extract knowledge from written policy documents and representing them in enforceable Knowledge Graphs fail when the text policies are ambiguous. Ambiguity can arise from lack of clarity, misuse of syntax, and/or the use of complex language. We describe a system to extract features from a policy document that affect its ambiguity and classify the documents based on the level of ambiguity present. We validate this approach using human annotators. We show that a large number of documents in a popular privacy policy corpus (OPP-115) are ambiguous. This affects the ability to automatically monitor privacy policies. We show that for policies that are more ambiguous according to our proposed measure, NLP-based text segment classifiers are less accurate.
- Research Article
11
- 10.1109/access.2022.3199882
- Jan 1, 2022
- IEEE Access
Privacy laws and app stores (e.g., Google Play Store) require mobile apps to have transparent privacy policies to disclose sensitive actions and data collection, such as accessing the phonebook, camera, storage, GPS, and microphone. However, many mobile apps do not accurately disclose their sensitive data access that requires sensitive (’dangerous’) permissions. Thus, analyzing discrepancies between apps’ permissions and privacy policies facilitates the identification of compliance issues upon which privacy regulators and marketplace operators can act. This paper proposes <i>PermPress</i> – an automated machine-learning system to evaluate an Android app’s permission-completeness, i.e., whether its privacy policy matches its dangerous permissions. <i>PermPress</i> combines machine learning techniques with human annotation of privacy policies to establish whether app policies contain permission-relevant information. <i>PermPress</i> leverages MPP-270, an annotated policy corpus, for establishing a gold standard dataset of permission completeness. This corpus shows that only 31% of apps disclose all dangerous permissions in privacy policies. By leveraging the annotated dataset and machine learning techniques, <i>PermPress</i> achieves an AUC score of 0.92 in predicting the permission-completeness of apps. A large-scale evaluation of 164, 156 Android apps shows that, on average, 7% of apps do not disclose more than half of their declared dangerous permissions in privacy policies, whereas 60% of apps omit to disclose at least one dangerous permission-related data collection in privacy policies. This paper’s investigation uncovers the non-transparent state of app privacy policies and highlights the need to standardize app privacy policies’ compliance and completeness checking process.
- Research Article
1
- 10.47989/ir30iconf47581
- Mar 11, 2025
- Information Research an international electronic journal
Introduction. Privacy policies inform users about data practices but are often complex and difficult to interpret. Human annotation plays a key role in understanding privacy policies, yet annotation disagreements highlight the complexity of these texts. Traditional machine learning models prioritize consensus, overlooking annotation variability and its impact on accuracy. Method. This study examines how annotation disagreements affect machine learning performance using the OPP-115 corpus. It compares majority vote and union methods with alternative strategies to assess their impact on policy classification. Analysis. The study evaluates whether increasing annotator consensus improves model effectiveness and if disagreement-aware approaches yield more reliable results. Results. Higher agreement levels improve model performance across most categories. Complete agreement yields the best F1-scores, especially for First Party Collection/Use and Third-Party Sharing/Collection. Annotation disagreements significantly impact classification outcomes, underscoring the need for understanding annotation disagreements. Conclusions. Ignoring annotation disagreements can misrepresent model accuracy. This study proposes new evaluation strategies that account for annotation variability, offering a more realistic approach to privacy policy analysis. Future work should explore the causes of annotation disagreements to improve machine learning transparency and reliability.
- Book Chapter
- 10.3233/faia210486
- Jan 14, 2022
Privacy is a fundamental human right and is widely end extensively protected in the western industrialized world. The recent advances in technologies, especially in the use of applications developed and designed for mobile devices, have led to the rise of its abuse on one hand and a higher awareness of the importance of privacy on the other side. Legal texts protecting privacy have attempted to rectify some of the problems, but the ecosystem giants and mobile apps developers adapted. In this paper, we analyze which data mobile apps developers are collecting. We have focused on a sample of apps in the medical and health field. The research was done using collocations analysis. A relationship between a base word and its collocative partners was sought. The initial visual results have led us to more detailed studies that unveiled some worrying patterns. Namely, applications are collect data about the users and their activities, but also about their family members, medical diagnoses, treatments, and alike, going well beyond the “need to function” / functionality threshold.
- Conference Article
1
- 10.1145/3544549.3585698
- Apr 19, 2023
While fitness tracker users consent to the processing of their sensitive data based on privacy policies, previous research has demonstrated that legal texts often remain unread or incomprehensible. This questions whether the given consent is indeed informed. While past research concentrated on improving privacy comprehension, our research aims to better understand user requirements for interactive and transparent privacy information and control systems. We mainly focus on users’ assessment of contextual and functional aspects. Findings from an online survey with fitness tracker users and non-users (N = 204) reveal that such systems need to support users and potential users throughout the usage life cycle, illustrating a dynamic change in requirements and their prioritization of information transparency and privacy control. Design recommendations derived from our results support the development of interactive and comprehensible privacy systems that enable more knowledgeable decisions on sharing and processing fitness tracker data.
- Research Article
- 10.1007/s10506-025-09450-0
- May 6, 2025
- Artificial Intelligence and Law
Open-texture—e.g. vague, ambiguous, under-specified, or abstract terms—in regulatory documents lead to inconsistent interpretation, and are an obstacle to the automatic processing of regulation by computers. Identifying which parts of a legal text fall under open-texture is therefore a necessary requirement to make progress in automating the law. In this paper, we propose that large language models (LLMs) might provide an effective way to automatically detect open-texture in legal texts. We first investigate the obstacles by situating open-texture in the broader literature, and we test the hypothesis using two different LLMs—the proprietary gpt-3.5-turbo and the open-source llama-2-70b-chat—for the task of identifying open-texture in the General Data Protection Regulation. We evaluate their performance by asking 12 annotators to assess their output. We find, overall, that gpt-3.5-turbo overperforms llama-2-70b-chat on F1-scores (0.84 vs 0.67), and its high F1-score could make it a suitable alternative, or complement, to using human annotators. We also test the sensitivity of the findings against four further LLMs combined with six different prompts, and replicate a finding that there is low agreement between annotators when it comes to the identification of open-texture. We conclude the article by discussing the subjectivity of open-texture, the lessons to draw when testing for open-texture, and the consequences of using LLMs in the legal domain.
- Research Article
- 10.24294/jipd8194
- Dec 12, 2024
- Journal of Infrastructure, Policy and Development
The technological infrastructure is the basis for the successful implementation and operation of information systems in small and medium enterprises. The study aimed to demonstrate the impact of cybersecurity on entrepreneurship strategies in small and medium enterprises. Through technological infrastructure in Balqa Governorate. The study population consisted of small and medium enterprises in Balqa Governorate in Jordan. The study followed the descriptive analytical approach and relied on the questionnaire to collect data. The sample size was 360 individuals were randomly select. The Statistical Package for Social Sciences (SPSS) was use to analyze the data. The study reached a set of results, including that the management of small and medium enterprises is committed to continuous supervision and control of customer information. Dealing with reliable parties to ensure the confidentiality of information, following strict standards for disclosure and circulation of customer data and information based on legal texts. Maintaining the privacy of customers’ financial data, in addition to supporting the successes of individuals based on the personal efforts of employees, providing a suitable work environment for employees, sustaining excellence and achievement, and working to increase awareness among its employees of the importance of innovation and creativity in work. The study recommended that customer data confidentiality should be consider a top priority for small and medium enterprises. The data should be stored in more than one place at the same time, that project websites should follow a privacy policy, and that the customer’s identity should be verify before submitting his data and documents, by involving employees in small and medium enterprises in specialized courses and workshops to demonstrate the importance of data and information confidentiality.
- Book Chapter
5
- 10.1007/978-3-030-83527-9_24
- Jan 1, 2021
We present a method for building text corpora for the supervised learning of text-to-text anonymization while maintaining a strict privacy policy. In our solution, personal data entities are detected, classified, and anonymized. We use available machine-learning methods, like named-entity recognition, and improve their performance by grouping multiple entities into larger units based on the theory of tabular data anonymization. Experimental results on annotated Czech Facebook Messenger conversations reveal that our solution has recall comparable to human annotators. On the other hand, precision is much lower because of the low efficiency of the named entity recognition in the domain of social messaging conversations. The resulting anonymized text is of high utility because of the replacement methods that produce natural text.
- Book Chapter
2
- 10.1007/978-3-642-35731-2_17
- Jan 1, 2012
In this work we illustrate a novel approach for solving an information extraction problem on legal texts. It is based on Natural Language Processing techniques and on the adoption of a formalization that allows coupling domain knowledge and syntactic information. The proposed approach is applied to extend an existing system to assist human annotators in handling normative modificatory provisions –that are the changes to other normative texts–. Such laws ‘versioning’ problem is a hard and relevant one. We provide a linguistic and legal analysis of a particular case of modificatory provision (the efficacy suspension), show how such knowledge can be formalized in a linguistic resource such as FrameNet, and used by the semantic interpreter.
- Book Chapter
7
- 10.1007/978-3-030-00794-2_16
- Jan 1, 2018
Legal court judgements have multiple participants (e.g. judge, complainant, petitioner, lawyer, etc.). They may be referred to in multiple ways, e.g., the same person may be referred as lawyer, counsel, learned counsel, advocate, as well as his/her proper name. For any analysis of legal texts, it is important to resolve such multiple mentions which are coreferences of the same participant. In this paper, we propose a supervised approach to this challenging task. To avoid human annotation efforts for Legal domain data, we exploit ACE 2005 dataset by mapping its entities to participants in Legal domain. We use basic Transfer Learning paradigm by training classification models on general purpose text (news in ACE 2005 data) and applying them to Legal domain text. We evaluate our approach on a sample annotated test dataset in Legal domain and demonstrate that it outperforms state-of-the-art baselines.
- Book Chapter
3
- 10.3233/faia230961
- Dec 7, 2023
Manual annotation is just as burdensome as it is necessary for some legal text analytic tasks. Given the promising performance of Generative Pretrained Transformers (GPT) on a number of different tasks in the legal domain, it is natural to ask if it can help with text annotation. Here we report a series of experiments using GPT-4 and GPT 3.5 as a pre-annotation tool to determine whether a sentence in a legal opinion describes a legal factor. These GPT models assign labels that human annotators subsequently confirm or reject. To assess the utility of pre-annotating sentences at scale, we examine the agreement among gold-standard annotations, GPT’s pre-annotations, and law students’ annotations. The agreements among these groups support that using GPT-4 as a pre-annotation tool is a useful starting point for large-scale annotation of factors.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.