Privacy risk quantification in education data using Markov model

Dinusha Vatsalan,Thierry Rakotoarivelo,Djazia Ladjal,Paul Tyler,Raghav Bhaskar

doi:10.1111/bjet.13223

Abstract

AbstractWith Big Data revolution, the education sector is being reshaped. The current data‐driven education system provides many opportunities to utilize the enormous amount of collected data about students' activities and performance for personalized education, adapting teaching methods, and decision making. On the other hand, such benefits come at a cost to privacy. For example, the identification of a student's poor performance across multiple courses. While several works have been conducted on quantifying the re‐identification risks of individuals in released datasets, they assume an adversary's prior knowledge about target individuals. Most of them do not utilize all the available information in the datasets. For example, event‐level information that associates multiple records to the same individual and correlation between attributes. In this work, we propose a method using a Markov Model (MM) to quantify re‐identification risks using all available information in the data under a more realistic threat model that assumes different levels of an adversary's knowledge about the target individual, ranging from any one of the attributes to all given attributes. Moreover, we propose a workflow for efficiently calculating MM risk which is highly scalable to large number of attributes. Experimental results from real education datasets show the efficacy of our model for re‐identification risk. Practitioner notesWhat is already known about this topic? There has been a number of works/research conducted on privacy risk quantification in datasets and in the Web. Majority of them have strong assumption about adversary's prior knowledge of target individual(s). Most of them do not utilize all the available information in the datasets, eg, event‐level or duplicate records and correlation between attributes. What this paper adds? This paper proposes a new re‐identification risk quantification model using Markov models. Our model addresses the shortcomings of existing works, eg, strong assumption about adversary's knowledge, unexplainable model, and utilizing available information in the datasets. Specifically, our proposed model not only focuses on the uniqueness of data points in the datasets (as most of the other existing methods), but also takes into account uniformity and correlation characteristics of these data points. Re‐identification risk quantification is computationally expensive and is not scalable to large datasets with increasing number of attributes. This paper introduces a workflow for data custodians to use to efficiently evaluate the worst‐case re‐identification risk in their datasets before releasing. It presents extensive experimental evaluation results of the proposed model for quantifying re‐identification risks on several real education datasets. Implications for practice and/or policy? Empirical results on real education datasets validate the significance and efficacy of the proposed model for re‐identification risk quantification compared to existing approaches. Our model can be used by the data custodians as a tool to evaluate the worst‐case risk of a dataset. It empowers data custodians to make informed decisions on appropriate actions to mitigate these risks (eg, data perturbation) before sharing or releasing their datasets to third parties. A typical use case would be one where the data custodians is an online course/program provider, which collects data about students' engagement with their courses and would like to share it with third parties for them to run learning analytics that would provide value‐added benefits back to the data custodian. We specifically study the privacy risk quantification for education data; however, our model is applicable to any tabular data release.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: British journal of educational technology : journal of the Council for Educational Technology	Publication Date: Apr 25, 2022
Citations: 6	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Privacy risk quantification in education data using Markov model

Abstract

Talk to us

Similar Papers

More From: British journal of educational technology : journal of the Council for Educational Technology

Lead the way for us

Similar Papers

Quantifying and Protecting Location Privacy

Information Technology | VOL. 57

28 Jan 2015
Information Technology | VOL. 57

Location l-Diversity against Multifarious Inference Attacks
Shinya Miyakawa ... Takuya Mori
-
Shinya Miyakawa, et. al.Shinya Miyakawa ... Takuya Mori
01 Jul 2012
01 Jul 2012

Dissecting Distribution Inference
Anshuman Suri ... David Evans
-
Anshuman Suri, et. al.Anshuman Suri ... David Evans
01 Feb 2023
01 Feb 2023

Quantification of model risk that is caused by model misspecification
M.B Seitshiro ... H.P Mashele
Journal of Applied Statistics | VOL. 49
M.B Seitshiro, et. al.M.B Seitshiro ... H.P Mashele
17 Nov 2020
Journal of Applied Statistics | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Privacy risk quantification in education data using Markov model

Abstract

Talk to us

Similar Papers

More From: British journal of educational technology : journal of the Council for Educational Technology