Abstract

Log-structured merge-tree (LSM-tree) family key-value stores are becoming the databases most in demand for big data systems. They provide an easy-to-implement interface, and they automatically perform garbage collection by applying a compaction procedure over the multilevel structure. CaseDB offers various advantages by reducing write amplification considerably, using a metadata compaction technique. However, it suffers from a space amplification problem in update-intensive workloads. As an implementation of the LSM-tree structure, CaseDB refuses to instantly perform deletes, but delays them for the compaction process, resulting in an increasing amount of deprecated data. This paper proposes a deduplication extended compaction method for CaseDB. It scans for duplicated keys within the compaction method and removes the old values. Experiment results show that the proposed technique offers various threshold values of deduplication for different balances between space amplification and write amplification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call