DOMe: A deduplication optimization method for the NewSQL database backups.

Longxiang Wang,Zhengdong Zhu,Xiaoshe Dong,Yinfeng Wang,Xingjun Zhang

doi:10.1371/journal.pone.0185189

Longxiang Wang, Zhengdong Zhu + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0185189

Copy DOI

Abstract

Reducing duplicated data of database backups is an important application scenario for data deduplication technology. NewSQL is an emerging database system and is now being used more and more widely. NewSQL systems need to improve data reliability by periodically backing up in-memory data, resulting in a lot of duplicated data. The traditional deduplication method is not optimized for the NewSQL server system and cannot take full advantage of hardware resources to optimize deduplication performance. A recent research pointed out that the future NewSQL server will have thousands of CPU cores, large DRAM and huge NVRAM. Therefore, how to utilize these hardware resources to optimize the performance of data deduplication is an important issue. To solve this problem, we propose a deduplication optimization method (DOMe) for NewSQL system backup. To take advantage of the large number of CPU cores in the NewSQL server to optimize deduplication performance, DOMe parallelizes the deduplication method based on the fork-join framework. The fingerprint index, which is the key data structure in the deduplication process, is implemented as pure in-memory hash table, which makes full use of the large DRAM in NewSQL system, eliminating the performance bottleneck problem of fingerprint index existing in traditional deduplication method. The H-store is used as a typical NewSQL database system to implement DOMe method. DOMe is experimentally analyzed by two representative backup data. The experimental results show that: 1) DOMe can reduce the duplicated NewSQL backup data. 2) DOMe significantly improves deduplication performance by parallelizing CDC algorithms. In the case of the theoretical speedup ratio of the server is 20.8, the speedup ratio of DOMe can achieve up to 18; 3) DOMe improved the deduplication throughput by 1.5 times through the pure in-memory index optimization method.

Highlights

Deduplication is an efficient data reduction technology, and it is used to mitigate the problem of huge data volume in storage systems
NewSQL directly writes the backup data into the non-volatile random-access memory (NVRAM) buffer, the DOMe carry out the deduplication process when the system is idle
We compared the key features of our SSD with the Intel1 OptaneTM [21] below, which uses the 3DXPoint technology and is the only available NVRAM device currently

Summary

Introduction

Deduplication is an efficient data reduction technology, and it is used to mitigate the problem of huge data volume in storage systems. Deduplication is widely used in storage systems [1, 2], especially in backup systems [3,4,5]. Database systems are an important part of IT infrastructure and are ubiquitous nowadays. Previous studies have investigated the effect of data deduplication on these data [6, 7]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Oct 19, 2017
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

DOMe: A deduplication optimization method for the NewSQL database backups.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Securing the Data Deduplication to Improve the Performance of Systems in the Cloud Infrastructure
Nishant N Pachpor ... Prakash S Prasad
-
Nishant N Pachpor, et. al.Nishant N Pachpor ... Prakash S Prasad
11 Sep 2019
11 Sep 2019

An Effective Content-Based Strategy Analysis for Large-Scale Deduplication Using a Multi-level Pattern-Matching Algorithm
A Sahaya Jenitha ... V Sinthu Janita Prakash
-
A Sahaya Jenitha, et. al.A Sahaya Jenitha ... V Sinthu Janita Prakash
15 Dec 2018
15 Dec 2018

Present State of the Art on Secure Data Deduplication in Cloud
Naga Raju Hari Manikyam ... M Shyamala Devi
-
Naga Raju Hari Manikyam, et. al.Naga Raju Hari Manikyam ... M Shyamala Devi
01 Jan 2020
01 Jan 2020

Enhanced attribute based access control with secure deduplication for big data storage in cloud
Praveen Kumar Premkamal ... P J A Alphonse
Peer-to-Peer Networking and Applications | VOL. 14
Praveen Kumar Premkamal, et. al.Praveen Kumar Premkamal ... P J A Alphonse
07 Jul 2020
Peer-to-Peer Networking and Applications | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DOMe: A deduplication optimization method for the NewSQL database backups.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one