Abstract

The era of big data has promoted the vigorous development of many industries, boosting the full potential of holistic data-driven analysis, yet it has also been accompanied by uninterrupted data breaches. In recent years, especially in China, data security laws and regulations have been promulgated continuously, and many of them have made clear requirements for data classification. As the support of data security initiatives, data classification has received the bulk of attention and has been hailed by all walks of life. There is a lot of valuable information contained in the issued regulations, which has already been well exploited in the research of privacy policy compliance verification, whereas few scholars have drawn on such information to guide data classification for security and compliance. As a step towards this direction, in this paper, we define two information types: one is “regulated data” mentioned in external laws and regulations, another is “non-regulated data”, indicating internal business data produced in a certain organization, and develop a novel generalization-enhanced decision tree classification algorithm called Gen-DT to classify data. In this way, data covered by the relevant data security regulatory mandates can be quickly identified and handled in full compliance as well. Furthermore, we evaluate the proposed compliance-driven data classification scheme using datasets collected from two famous universities in China and validate that our approach can achieve better performance than existing popular machine learning techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call