Privacy in Big Data Through Variable t-Closeness for MSN Attributes

Zakariae El Ouazzani,Hanan El Bakkali

doi:10.1007/978-3-319-97719-5_9

Abstract

With the raised and extensive use of online data, the notion of big data has been widely studied in the literature recently. In fact, a big quantity of sensitive personal information could be contained in high dimensional data bases. This data needs to be sanitized before publishing. In this context, many ways were proposed in order to ensure privacy in big data including pseudonymization, cryptographic and anonymization techniques. T-closeness has been studied and treated with great interest as an anonymization technique ensuring privacy in big data when dealing with sensitive attributes. Although, t-closeness could be applied when treating quasi identifier attributes, but it is more suitable for sensitive attributes. Despite the fact that many algorithms for t-closeness have been proposed, many of them admit that the threshold t of t-closeness is set to a fixed value. In this chapter, a method using t-closeness for multiple sensitive numerical (MSN) attributes is presented. The method could be applied on both single and multiple sensitive numerical attributes. In the case where the data set contains attributes with high correlation, then our method will be applied only on one numerical attribute. In addition, a new algorithm called variable t-closeness for multiple sensitive numerical attributes was implemented. Our algorithm gives good results in terms of data anonymization and was experimentally evaluated on a test table. Furthermore, we highlighted all the steps of our proposed algorithm with detailed comments.

Full Text