Abstract

Personal data described as any information that relates to an identifiable individual that is stored, maintained its truth, and is protected its confidentiality to prevent abuse by irresponsible parties. In this study, we analyzed the types of sensitive personal data published along with their distribution on the higher education institutions' information system websites. We also implemented data mining clustering analysis using the Expectation-Maximization method. As a result, from 72,522 instances that have been analyzed based on NIST criteria and Indonesian Regulation Number 23 of the year 2006, around 87.72% out of 189,358 obtained sensitive personal data were Critical personally identifiable information (PII), while the remaining 12.28% were Potential PII. Types of Critical PII published, including the place of birth, date of birth, home address, telephone number, email address, face photo, religion, and employee identification number. Meanwhile, the Potential PII including position, work location/unit, district area of residence, and age. Average clusters accuracy obtained using Expectation Maximization was 98.53%, with 1,682 incorrectly clustered instances. In the future, the used criteria for determining sensitive personal data could refer to local regulations so that the results and the recommendations obtained are more suitable to the territory where the study conducted.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call