Most of the watermarking techniques designed to protect relational data often use the Primary Key (PK) of relations to perform the watermark synchronization. Despite offering high confidence to the watermark detection, these approaches become useless if the PK can be erased or updated. A typical example is when an attacker wishes to use a stolen relation, unlinked to the rest of the database. In that case, the original values of the PK lose relevance, since they are not employed to check the referential integrity. Then, it is possible to erase or replace the PK, compromising the watermark detection with no need to perform the slightest modification on the rest of the data. To avoid the problems caused by the PK-dependency some schemes have been proposed to generate Virtual Primary Keys (VPK) used instead. Nevertheless, the quality of the watermark synchronized using VPKs is compromised due to the presence of duplicate values in the set of VPKs and the fragility of the VPK schemes against the elimination of attributes. In this paper, we introduce the metrics to allow precise measuring of the quality of the VPKs generated by any scheme without requiring to perform the watermark embedding. This way, time waste can be avoided in case of low-quality detection. We also analyze the main aspects to design the ideal VPK scheme, seeking the generation of high-quality VPK sets adding robustness to the process. Finally, a new scheme is presented along with the experiments carried out to validate and compare the results with the rest of the schemes proposed in the literature.
Read full abstract