Abstract

In Turkey, Turkish Personal Data Protection Rule (PDPR) No. 6698, in force since 2016, provides protection to citizens for the legal existence of their personal data. Although the law provides excellent guidance, companies currently face challenges in complying with its regulations in terms of storing, sharing, or monitoring personal data. Since any specially designed software with wide industrial usage is not on the market, almost all of the companies have no other choice but to take expensive and error-prone operations manually to ensure their compliance. In this paper, we present an automated solution to facilitate and accelerate PDPR compliance. In a structured or unstructured document, words or phrases that could include personal data entities are tagged with the help of a Bi-LSTM based named entity recognition model and a rule-based matching component that employs contextual analysis. To find associations in personal data and obtain individual personal profiles, these entities are divided into categories according to their confidence levels. Personal profiles are constructed using an approach similar to clustering. It treats the personal data categories with high identification levels as separate clusters and finds related personal data entities at the left and/or right of its contexts. We evaluated the system on a data set formed of 70 documents of different types and personal data entities. We obtained 91.76 % micro-averaged F1-measure for personal data entity classification and 72.46 % accuracy for profile extraction. We also performed experiments related to the performance of the named entity recognition and to the time complexity of the overall system on a data set formed of 33K documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.