Primary Keys Research Articles

Most of the watermarking techniques designed to protect relational data often use the Primary Key (PK) of relations to perform the watermark synchronization. Despite offering high confidence to the watermark detection, these approaches become useless if the PK can be erased or updated. A typical example is when an attacker wishes to use a stolen relation, unlinked to the rest of the database. In that case, the original values of the PK lose relevance, since they are not employed to check the referential integrity. Then, it is possible to erase or replace the PK, compromising the watermark detection with no need to perform the slightest modification on the rest of the data. To avoid the problems caused by the PK-dependency some schemes have been proposed to generate Virtual Primary Keys (VPK) used instead. Nevertheless, the quality of the watermark synchronized using VPKs is compromised due to the presence of duplicate values in the set of VPKs and the fragility of the VPK schemes against the elimination of attributes. In this paper, we introduce the metrics to allow precise measuring of the quality of the VPKs generated by any scheme without requiring to perform the watermark embedding. This way, time waste can be avoided in case of low-quality detection. We also analyze the main aspects to design the ideal VPK scheme, seeking the generation of high-quality VPK sets adding robustness to the process. Finally, a new scheme is presented along with the experiments carried out to validate and compare the results with the rest of the schemes proposed in the literature.

Read full abstract

Primary keys (PKs) and foreign keys (FKs) are important elements of relational schemata in various applications, such as query optimization and data integration. However, in many cases, these constraints are unknown or not documented. Detecting them manually is time-consuming and even infeasible in large-scale datasets. We study the problem of discovering primary keys and foreign keys automatically and propose an algorithm to detect both, namely Holistic Primary Key and Foreign Key Detection (HoPF). PKs and FKs are subsets of the sets of unique column combinations (UCCs) and inclusion dependencies (INDs), respectively, for which efficient discovery algorithms are known. Using score functions, our approach is able to effectively extract the true PKs and FKs from the vast sets of valid UCCs and INDs. Several pruning rules are employed to speed up the procedure. We evaluate precision and recall on three benchmarks and two real-world datasets. The results show that our method is able to retrieve on average 88% of all primary keys, and 91% of all foreign keys. We compare the performance of HoPF with two baseline approaches that both assume the existence of primary keys.

Read full abstract

Primary Keys Research Articles

Related Topics

Articles published on Primary Keys

HQR-Scheme: A High Quality and resilient virtual primary key generation approach for watermarking relational data

Holistic primary key and foreign key detection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Primary Keys Research Articles

Related Topics

Articles published on Primary Keys

HQR-Scheme: A High Quality and resilient virtual primary key generation approach for watermarking relational data

Holistic primary key and foreign key detection