Deduplication of encrypted data is a significant function for both the privacy of stored data and efficient storage management. Several deduplication techniques have been designed to provide improved security or efficiency. In this study, we focus on the client-side deduplication technique, which has more advantages than the server-side deduplication technique, particularly in communication overhead, owing to conditional data transmissions. From a security perspective, poison, dictionary, and identification attacks are considered as threats against client-side deduplication. Unfortunately, in contrast to other attacks, identification attacks and the corresponding countermeasures have not been studied in depth. In identification attacks, an adversary tries to identify the existence of a specific file. Identification attacks should be countered because adversaries can use the attacks to break the privacy of the data owner. Therefore, in the literature, some counter-based countermeasures have been proposed as temporary remedies for such attacks. In this paper, we present an analysis of the security features of deduplication techniques against identification attacks and show that the lack of security of the techniques can be eliminated by providing uncertainness to the conditional responses in the deduplication protocol, which are based on the existence of files. We also present a concrete countermeasure, called the time-locked deduplication technique, which can provide uncertainness to the conditional responses by withholding the operation of the deduplication functionality until a predefined time. An additional cost for locking is incurred only when the file to be stored does not already exist in the server’s storage. Therefore, our technique can improve the security of client-side deduplication against identification attacks at almost the same cost as existing techniques, except in the case of files uploaded for the first time.
Read full abstract