Abstract

Applications for document similarity detection are widespread in diverse communities, including institutions and corporations. However, currently available detection systems fail to take into account the private nature of material or documents that have been outsourced to remote servers. None of the existing solutions can be described as lightweight techniques that are compatible with lightweight client implementation, and this deficiency can limit the effectiveness of these systems. For instance, the discovery of similarity between two conferences or journals must maintain the privacy of the submitted papers in a lightweight manner to ensure that the security and application requirements for limited-resource devices are fulfilled. This paper considers the problem of lightweight similarity detection between document sets while preserving the privacy of the material. The proposed solution permits documents to be compared without disclosing the content to untrusted servers. The fingerprint set for each document is determined in an efficient manner, also developing an inverted index that uses the whole set of fingerprints. Before being uploaded to the untrusted server, this index is secured by the Paillier cryptosystem. This study develops a secure, yet efficient method for scalable encrypted document comparison. To evaluate the computational performance of this method, this paper carries out several comparative assessments against other major approaches.

Highlights

  • Given the sheer quantity of content accessible via the World Wide Web [1], it is not difficult to appropriate another’s theories or ideas as your own with no acknowledgement of the original author

  • The remainder of this paper is organized as follows: a summary of the most important achievements of related research is explored in Section ‘Related works’; Section ‘Document fingerprinting’ illustrates the document fingerprinting technique; Section ‘Scheme overview’ introduces the problem definition and security requirements; Section ‘Proposed scheme details’ presents the proposed approach in terms of initialization and private similarity computation and search query phases; Section ‘Security analysis’ verifies the protection of data privacy; Section ‘System evaluation’ analyses the performance of the approach before the findings of the paper are summarised in the Section ‘Conclusion’

  • Fingerprints are generated deterministically, so the server must not know whether the same query has been presented before; such leakage is known as query patterns

Read more

Summary

Introduction

Given the sheer quantity of content accessible via the World Wide Web [1], it is not difficult to appropriate another’s theories or ideas as your own with no acknowledgement of the original author. It will demonstrate how documents can be compared in encrypted domains in a lightweight manner without compromising privacy and revealing the plain data It develops an efficient and secure solution to compute the common fingerprint terms between the document provided and the entire stored collection; subsequently, the paper will describe how the fingerprint approach can be employed innovatively to generate a secure inverted index, upon which a secure and lightweight privacy-preserving DSD can be constructed. This significantly simplified process results in a low client-side response time. The remainder of this paper is organized as follows: a summary of the most important achievements of related research is explored in Section ‘Related works’; Section ‘Document fingerprinting’ illustrates the document fingerprinting technique; Section ‘Scheme overview’ introduces the problem definition and security requirements; Section ‘Proposed scheme details’ presents the proposed approach in terms of initialization and private similarity computation and search query phases; Section ‘Security analysis’ verifies the protection of data privacy; Section ‘System evaluation’ analyses the performance of the approach before the findings of the paper are summarised in the Section ‘Conclusion’

Related Works
Document Fingerprinting
Problem Definition
Security Requirements
Proposed Scheme Details
Initialization Stage
15: End for
Private Similarity Computation and Search Queries Stage
16: End for
Security of Encryption
Search Anonymity
Setup of Experiment
Retrieval Evaluation
Effectiveness
Index Building
Ranking Time
Fingerprint Security
Comparative Performance Analysis
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call