Preserving the Privacy of Sensitive Data Using Bit-Coded-Sensitive Algorithm (BCSA)
Organizations now deal with massive amounts of data. Data is collected from various points such as hospitals, credit card companies, and search engines. After collecting this voluminous data, it is published and shared for research. Data that is collected may have sensitive information that might be used to identify an individual and consequently lead to privacy violations when published. To address this challenge, privacy-preserving data publishing (PPDP) seeks to remove threats to privacy while ensuring that the necessary information is released for data mining. Various techniques have been proposed to solve the problems associated with sensitive information. One such technique is k- anonymity. This technique is the best and very efficient. However, it also leads to loss of information, reduces data utility, and works well only with static tables. In this paper, we proposed a technique that addresses the challenges of K-anonymity known as the Bit-Coded-Sensitive Algorithm (BCSA). This algorithm is more efficient and effective and ensures that the privacy of the individual is preserved by avoiding disclosure, and linkages and at the same time ensuring high quality and utility of data. BCSA first identifies the source of data and based on that, uses bits to code sensitive data with a key.
- Research Article
6
- 10.1002/cpe.6808
- Jan 4, 2022
- Concurrency and Computation: Practice and Experience
Most of data in various forms contain sensitive information about individuals and so publishing such data might violate privacy. Privacy preserving data publishing (PPDP) is an essential for publishing useful data while preserving privacy. Anonymization, which is a utility based privacy preserving approach, helps hiding the identities of data subjects and also provides data utility. Since data utility is effective on the accuracy of analysis model, new anonymization algorithms to improve data utility are always required. Mondrian is one of the near‐optimal anonymization models that presents high data utility and is frequently used for PPDP. However, the upper bound problem of Mondrian causes a decrease in potential data utility. This article focuses on this problem and proposes a new utility‐aware anonymization model (u‐Mondrian). Experimental results have shown that u‐Mondrian presents an acceptable solution to the upper bound problem, increases total data utility and presents higher data utility than Mondrian for different partitioning strategies and datasets.
- Research Article
21
- 10.1186/s12911-016-0293-4
- Jul 1, 2016
- BMC Medical Informatics and Decision Making
BackgroundTo facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes.MethodsWe propose a new privacy model called MS(k, θ*)-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ*, for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ*, including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals.ResultsWith all three different threshold settings for sensitive value, our method can successively prevent the disclosure of sensitive values (nearly all observed DRs are zeros) without sacrificing too much of data utility. With non-uniform threshold setting, level-wise or frequency-based, our MS(k, θ*)-bounding exhibits the best data utility and the least privacy risk among all the models. The experiments conducted on selected ADR signals from MedWatch show that only very small difference on signal strength (PRR or ROR) were observed. The results show that our method can effectively prevent the disclosure of patient sensitive information without sacrificing data utility for ADR signal detection.ConclusionsWe propose a new privacy model for protecting SRS data that possess some characteristics overlooked by contemporary models and an anonymization algorithm to sanitize SRS data in accordance with the proposed model. Empirical evaluation on the real SRS dataset, i.e., FAERS, shows that our method can effectively solve the privacy problem in SRS data without influencing the ADR signal strength.
- Research Article
32
- 10.1002/sec.1527
- Jul 7, 2016
- Security and Communication Networks
Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in data publishing, because of its potential for preserving more data utility than others such as the generalization and bucketization approaches. However, in this paper, we show that the slicing method has disclosure risks for some absolute facts, which would help the adversary to find invalid records in the sliced microdata table, resulting in breach of individual privacy. To increase the privacy of published data in the sliced tables, a new method called value swapping is proposed in this work, aimed at decreasing the attribute disclosure risk for the absolute facts and ensuring the l-diverse slicing. By value swapping, the published table contains no invalid information such that the adversary cannot breach the individual privacy. Experimental results also show that the NEW method is able to keep more data utility than the existing slicing methods in a published microdata table. Copyright © 2016 John Wiley & Sons, Ltd.
- Research Article
2
- 10.1007/s41870-020-00531-8
- Oct 20, 2020
- International Journal of Information Technology
The data provided by individuals and various organizations while using internet applications and mobile devices are very useful to generate solutions and create new opportunities. The data which is shared needs to be precise to get the quality results. The data which may contain an individual’s sensitive information cannot be revealed to the world without applying some privacy preserving technique on it. Privacy preserving data mining (PPDM) and Privacy preserving data publishing (PPDP) are some of the techniques which can be utilized to preserve privacy. There are some positives and negatives for every technique. The cons frequently constitute loss of data, reduction in the utility of data, compromised diversity of data, reduced security, etc. In this paper, the authors propose a new technique called Remodeling, which works in conjunction with the k-anonymity and K-means algorithm to ensure minimum data loss, better privacy preservation while maintaining the diversity of data. Network data security is also handled by this proposed model. In this research paper, theoretically, we have shown that the proposed technique addresses all the above-mentioned cons and also discusses the merits and demerits of the same.
- Single Report
- 10.2172/963952
- Aug 1, 2009
The Department of Energy and its contractors store and process massive quantities of sensitive information to accomplish national security, energy, science, and environmental missions. Sensitive unclassified data, such as personally identifiable information (PII), official use only, and unclassified controlled nuclear information require special handling and protection to prevent misuse of the information for inappropriate purposes. Industry experts have reported that more than 203 million personal privacy records have been lost or stolen over the past three years, including information maintained by corporations, educational institutions, and Federal agencies. The loss of personal and other sensitive information can result in substantial financial harm, embarrassment, and inconvenience to individuals and organizations. Therefore, strong protective measures, including data encryption, help protect against the unauthorized disclosure of sensitive information. Prior reports involving the loss of sensitive information have highlighted weaknesses in the Department's ability to protect sensitive data. Our report on Security Over Personally Identifiable Information (DOE/IG-0771, July 2007) disclosed that the Department had not fully implemented all measures recommended by the Office of Management and Budget (OMB) and required by the National Institute of Standards and Technology (NIST) to protect PII, including failures to identify and encrypt PII maintained on information systems. Similarly, the Government Accountability Office recently reported that the Department had not yet installed encryption technology to protect sensitive data on the vast majority of laptop computers and handheld devices. Because of the potential for harm, we initiated this audit to determine whether the Department and its contractors adequately safeguarded sensitive electronic information. The Department had taken a number of steps to improve protection of PII. Our review, however, identified opportunities to strengthen the protection of all types of sensitive unclassified electronic information and reduce the risk that such data could fall into the hands of individuals with malicious intent. In particular, for the seven sites we reviewed: (1) Four sites had either not ensured that sensitive information maintained on mobile devices was encrypted. Or, they had improperly permitted sensitive unclassified information to be transmitted unencrypted through email or to offsite backup storage facilities; (2) One site had not ensured that laptops taken on foreign travel, including travel to sensitive countries, were protected against security threats; and, (3) Although required by the OMB since 2003, we learned that programs and sites were still working to complete Privacy Impact Assessments - analyses designed to examine the risks and ramifications of using information systems to collect, maintain, and disseminate personal information. Our testing revealed that the weaknesses identified were attributable, at least in part, to Headquarters programs and field sites that had not implemented existing policies and procedures requiring protection of sensitive electronic information. In addition, a lack of performance monitoring contributed to the inability of the Department and the National Nuclear Security Administration (NNSA) to ensure that measures were in place to fully protect sensitive information. As demonstrated by previous computer intrusion-related data losses throughout the Department, without improvements, the risk or vulnerability for future losses remains unacceptably high. In conducting this audit, we recognized that data encryption and related techniques do not provide absolute assurance that sensitive data is fully protected. For example, encryption will not necessarily protect data in circumstances where organizational access controls are weak or are circumvented through phishing or other malicious techniques. However, as noted by NIST, when used appropriately, encryption is an effective tool that can, as part of an overall risk-management strategy, enhance security over critical personal and other sensitive information. The audit disclosed that Sandia National Laboratories had instituted a comprehensive program to protect laptops taken on foreign travel. In addition, the Department issued policy after our field work was completed that should standardize the Privacy Impact Assessment process, and, in so doing, provide increased accountability. While these actions are positive steps, additional effort is needed to help ensure that the privacy of individuals is adequately protected and that sensitive operational data is not compromised. To that end, our report contains several recommendations to implement a risk-based protection scheme for the protection of sensitive electronic information.
- Conference Article
4
- 10.1109/sws.2009.5271800
- Aug 1, 2009
As growing interest in data publishing and analysis, privacy preserving data publication has become more important today. When a table containing the sensitive information is published, privacy of each individual should be protected. On the other hand, a data holder also considers minimizing information loss for analysis as long as the privacy is preserved. A few years ago, k-anonymity and l-diversity models have been suggested in order to protect privacy. However, these solutions are limited to static data release. Recently, the m-invariance model has been proposed to apply publication of dynamic environments. However, m-invariance generalization technique causes high information loss. In this paper, we propose a simple and safe anonymization technique without generalization while assuring high data utility in dynamic environments.
- Conference Article
10
- 10.1109/fskd.2013.6816364
- Jul 1, 2013
As a serious concern in data publishing and analysis, privacy preservation of individuals has received much attentions. Anonymity models via generalization can protect individual privacy, but often lead to superabundance information loss. Therefore, privacy preserving data publishing needs a careful balance between privacy protection and data utility. The challenge is how to lessen the information loss during anonymity. This paper presents a (k, l, θ)-diversity model base on clustering to minimize the information loss as well as assure data quality. We take into accounts the cluster size, the distinct sensitive attribute values and the privacy preserving degree for this model. Besides, we account for the complexity of our algorithm. Extensive experimental evaluation shows that our techniques clearly outperform the existing approaches in terms of execution time and data utility.
- Conference Article
3
- 10.1109/ikt.2015.7288743
- May 1, 2015
The main problem in releasing sensitive data to public networks and domains is how to publish data while protecting privacy and permitting useful analysis. This problem is known as privacy-preserving data publishing. In the well-known (α,β)-privacy model, only two types of risks are considered: presence leakage (membership disclosure) by which adversaries may explicitly identify individuals in (or not in) the published dataset, and association leakage (attribute disclosure) by which they may unambiguously correlate individuals to their sensitive information. Ambiguity is an anonymization technique for this privacy model that publishes attributes of tuples in separate tables. The lossy join of these table produce false tuples which inject uncertainty. In this paper, we improve this anonymization technique (Ambiguity+) that publish the frequency of each distinct value in order to preserve better data utility based on the (α,β)-privacy model. Experimental results demonstrate that our work preserves data utility at a satisfactory level and also information loss is considerably decreased.
- Conference Article
1
- 10.1109/iccmc.2018.8487483
- Feb 1, 2018
The advances in digital information applications facilitates the collection of huge amount of data about governments, healthcare, other organizations and individuals. To make this data available for researchers, businesses and other users, it needs to be released. This in turn increases the demand of exchanging and publishing this collected data. However, data in its original form, typically contains sensitive information about individuals and/or organizations, and publishing such data will violate individual or organizational privacy. Hence, Privacy Preserving Data Publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Before data is published to the concerned users, it is altered to maintain its privacy without compromising data utility, using various anonymization techniques. Real-time datasets contain different types of Multiple Sensitive Attributes (MSAs) (which could be numerical or categorical). Anonymization for only Single Sensitive Attribute is not suitable for functional usage. Thus, it is important to maintain the association between these MSAs and to preserve the privacy of Mixed (numerical and categorical) MSAs efficiently while working with high dimensional data. The main focus of this paper is to analyse the different schemes proposed in literature for PPDP of MSAs.
- Research Article
- 10.1002/bult.49
- Feb 1, 1997
- Bulletin of the American Society for Information Science and Technology
Privacy Survival Guide: How to Take Control of Your Personal Information
- Research Article
- 10.14569/ijacsa.2020.0111220
- Jan 1, 2020
- International Journal of Advanced Computer Science and Applications
Feature selection is vital for data mining as each organization gathers a colossal measure of high dimensional microdata. Among significant standards of the algorithms for feature selection, the primary one which is currently considered as significant is feature selection stability along with accuracy. Privacy preserving data publishing methods with various delicate traits are analyzed to lessen the likelihood of adversaries to figure the touchy values. By and large, protecting the delicate values is typically accomplished by anonymizing data by utilizing generalization and suppression methods which may bring about information loss. Strategies other than generalization and suppression are investigated to diminish information loss. Privacy preserving data publishing with the overlapped slicing technique with various delicate ascribes tackles the issues in microdata with numerous touchy attributes. Feature selection stability is a vital criterion of data mining technique because of the accumulation of ever increasing dimensionality of microdata due to everyday activities on the World Wide Web. Feature selection stability is directly correlated with data utility. Feature selection stability is data centric and hence modifications of a dataset for privacy preservation affects feature selection stability along with data utility. As feature selection stability is data-driven, the impacts of privacy preserving data publishing based on overlapped slicing on feature selection stability and accuracy is investigated in this paper.
- Book Chapter
4
- 10.1007/11775300_13
- Jan 1, 2006
Recent efforts have been made to address the problem of privacy preservation in data publishing. However, they mainly focus on preserving data privacy. In this paper, we address another aspect of privacy preservation in data publishing, where some of the knowledge implied by a dataset are regarded as private or sensitive information. In particular, we consider that the data are stored in a transaction database, and the knowledge is represented in the form of patterns. We present a data sanitization algorithm, called SanDB, for effectively protecting a set of sensitive patterns, meanwhile attempting to minimize the impact of data sanitization on the non-sensitive patterns. The experimental results show that SanDB can achieve significant improvement over the best approach presented in the literature.
- Research Article
- 10.14288/1.0051391
- Jan 1, 2007
The last several decades have witnessed a phenomenal growth in the networking infrastructure connecting computers all over the world. The Web has now become an ubiquitous channel for information sharing and dissemination. More and more data is being exchanged and published on the Web. This growth has created a whole new set of research challenges, while giving a new spin to some existing ones. For example, XML(eXtensible Markup Language), a self-describing and semi-structured data format, has emerged as the standard for representing and exchanging data between applications across the Web. An important issue of data publishing is the protection of sensitive and private information. However, security/privacy-enhancing techniques bring disadvantages: security-enhancing techniques may incur overhead for query answering, while privacy-enhancing techniques may ruin data utility. In this thesis, we study how to overcome such overhead. Specifically, we address the following two problems in this thesis: (a) efficient and secure query evaluation over published XML databases, and (b) publishing relational databases while protecting privacy and preserving utility. The first part of this thesis focuses on efficiency and security issues of query evaluation over XML databases. To protect sensitive information in the published database, security policies must be defined and enforced, which will result in unavoidable overhead. Due to the security overhead and the complex structure of XML databases, query evaluation may become inefficient. In this thesis, we study how to securely and efficiently evaluate queries over XML databases. First, we consider the access-controlled database. We focus on a security model by which every XML element either is locally assigned a security level or inherits the security level from one of its ancestors. Security checks in this model can cause considerable overhead for query evaluation. We investigate how to reduce the security overhead by analyzing the subtle interactions between inheritance of security levels and the structure of the XML database. We design a sound and complete set of rules and develop efficient, polynomial-time algorithms for optimizing security checks on queries. Second, we consider encrypted XML database in a database-as-service model, in which the private database is hosted by an untrusted server. Since the untrusted server has no decryption key, its power of query processing is very limited, which results in inefficient query evaluation. We study how to support secure and efficient query evaluation in this model. We design the metadata that will be hosted on the server side with the encrypted database. We show that the presence of the metadata not only facilitates query processing but also guarantees data security. We prove that by observing a series of queries from the client and responses by itself, the server's knowledge about the sensitive information in the database is always below a given security threshold. The second part of this thesis studies the problem of preserving both privacy and the utility when publishing relational databases. To preserve utility, the published data will not be perturbed. Instead, the base table in the original database will be decomposed into several view tables. First, we define a general framework to measure the likelihood of privacy breach of a published view. We propose two attack models, unrestricted and restricted models, and derive formulas to quantify the privacy breach for each model. Second, we take utility into consideration. Specifically, we study the problem of how to design the scheme of published views, so that data privacy is protected while maximum utility is guaranteed. Given a database and its scheme, there are exponentially many candidates for published views that satisfy both privacy and utility constraints. We prove that finding the globally optimal safe and faithful view, i.e., the view that does not violate any privacy constraints and provides the maximum utility, is NP-hard. We propose the locally optimal safe and faithful view as the heuristic, and show how we can efficiently find a locally optimal safe and faithful view in polynomial time.
- Research Article
1514
- 10.1145/1749603.1749605
- Jun 1, 2010
- ACM Computing Surveys
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
- Video Transcripts
- 10.48448/tbmv-vs12
- Aug 1, 2021
This position paper investigates the problem of automated text anonymisation, which is a prerequisite for secure sharing of documents containing sensitive information about individuals. We summarise the key concepts behind text anonymisation and provide a review of current approaches. Anonymisation methods have so far been developed in two fields with little mutual interaction, namely natural language processing and privacy-preserving data publishing. Based on a case study, we outline the benefits and limitations of these approaches and discuss a number of open challenges, such as (1) how to account for multiple types of semantic inferences, (2) how to strike a balance between disclosure risk and data utility and (3) how to evaluate the quality of the resulting anonymisation. We lay out a case for moving beyond sequence labelling models and incorporate explicit measures of disclosure risk into the text anonymisation process.
- Research Article
1
- 10.3991/ijes.v10i04.35295
- Dec 7, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
- 10.3991/ijes.v10i04.35023
- Dec 7, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
4
- 10.3991/ijes.v10i04.35163
- Dec 7, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
- 10.3991/ijes.v10i03.33893
- Nov 4, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
4
- 10.3991/ijes.v10i03.34057
- Nov 4, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
3
- 10.3991/ijes.v10i03.35059
- Nov 4, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
- 10.3991/ijes.v10i03.34317
- Nov 4, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
- 10.3991/ijes.v10i03.34375
- Nov 4, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
3
- 10.3991/ijes.v10i02.29735
- Jun 22, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Research Article
2
- 10.3991/ijes.v10i02.29301
- Jun 22, 2022
- International Journal of Recent Contributions from Engineering, Science & IT (iJES)
- Ask R Discovery
- Chat PDF