Abstract

Many distributed database systems that guarantee high concurrency and scalability adopt read-write separation architecture. Simultaneously, these systems need to store massive amounts of data daily, requiring different mechanisms for storing and accessing data, such as hot and cold data access strategies. Unlike distributed storage systems, the distributed database splits a table into sub-tables or shards, and the request frequency of each sub-table is not the same within a specific time. Therefore, it is not only necessary to design hot-to-cold approaches to reduce storage overhead, but also cold-to-hot methods to ensure high concurrency of those systems. We present a new redundant strategy named CBase-EC, using erasure codes to trade the performances of transaction processing and storage efficiency for CBase database systems developed for financial scenarios of the Bank. Two algorithms are proposed: the hot-cold tablets (shards) recognition algorithm and the hot-cold dynamic conversion algorithm. Then we adopt two optimization approaches to improve CBase-EC performance. In the experiment, we compare CBase-EC with three-replicas in CBase. The experimental results show that although the transaction processing performance declined by no more than 6%, the storage efficiency increased by 18.4%.

Highlights

  • With the increasing complexity of the Internet business model, various Distributed Database Management System (DDBMS) architectures are emerging and developing

  • The experimental results show that the strategy presented in this paper has no significant decrease in transaction throughput, at most about 6%, but the storage efficiency improves by 18.4%

  • CBase-erasure code (EC) can dynamically recognize and convert hot-cold tablets and maintain loading balance by using ECs to trade off transactions processing throughput-storage efficiency

Read more

Summary

Introduction

With the increasing complexity of the Internet business model, various Distributed Database Management System (DDBMS) architectures are emerging and developing. The design of erasure coding technology has great practical significance These studies included the following aspects: trading off storage efficiency and repair bandwidth overhead, improving recovery rates, selecting the optimal data block storage location, and optimizing utilization of CPU resources. Transaction processing requires frequent data access in DDBMSs. In the read-write separation architecture, the multi-replicas mechanism can improve system throughput. The CBase adopts a distributed architecture with read-write separation based on OceanBase 0.4.2 (OceanBase Homepage: https://oceanbase.alipay.com/), and its redundancy strategy is the three-replicas. It divides the data into baseline data and incremental data and merges these data at a specific period.

Background
CBase Database System
Hot and Cold Tablets Recognition
Conversion Strategy of Hot and Cold Tablets
Update Optimization
Storage Efficiency
Encoding Performance Optimization
Update Performance
Parallel Recovery Performance
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call