Research and implementation of HTAP for distributed database

Wenjie Liu,Ouya Pei,Changhong Jing,Jintao Gao

doi:10.1051/jnwpu/20213920430

Abstract

Data processing can be roughly divided into two categories, online transaction processing OLTP(on-line transaction processing) and online analytical processing OLAP(on-line analytical processing). OLTP is the main application of traditional relational databases, and it is some basic daily transaction processing, such as bank pipeline transactions and so on. OLAP is the main application of the data warehouse system, it supports some more complex data analysis operations, focuses on decision support, and provides popular and intuitive analysis results. As the amount of data processed by enterprises continues to increase, distributed databases have gradually replaced stand-alone databases and become the mainstream of applications. However, the current business supported by distributed databases is mainly based on OLTP applications, lacking OLAP implementation. This paper proposes an implementation method of HTAP for distributed database CBase, which provides an implementation method of OLAP analysis for CBase, and can easily deal with data analysis of large amounts of data.

Highlights

Data processing can be roughly divided into two categories, online transaction processing OLTP( on⁃line transaction processing) and online analytical processing OLAP
OLTP is the main application of traditional relational databases, and it is some basic daily transaction processing, such as bank pipe⁃ line transactions and so on
OLAP is the main application of the data warehouse system, it supports some more com⁃ plex data analysis operations, focuses on decision support, and provides popular and intuitive analysis results

Summary

Introduction

为此, 本文提出了一种基于分布式数据库 CBase 的 HTAP 方案,在 CBase 和 Spark 之间建立一个适配层,将 CBase 与 Spark 分析引擎结合起来,实现 OLTP 与 OLAP 功能,并对适配层的数据传输进行优化,提高 AP 分析效率。由于在同一集群中同时进行了 OLTP 与 OLAP 工作会对性能造成影响,因此,出现了许多集成第三方平台的 HTAP 数据库, 通过集成 Storm、 Flink 及 Spark 等平台来实现 OLAP 分析。本文对分布式数据库的 HTAP 设计更简便且容易实现,省去了一系列嵌入及维护工作,直接在下层通过适配层连接 Spark 节点,通过适配层将所有缓存数据传输至各个 Spark 节点,即可进行 AP 分析。本文设计方案的实现弥补了当前 CBase 对 OLAP 分析功能的缺失,并且为所有 Spark 节点的数据存储提供了空间。同时,各个组件之间是相互独立的,适配层与最下层的 CBase 相互剥离, 且与最上层的 Spark 分析 API 相互剥离,并不会由于某个组件的版本更新或者换代后,发生无法识别或宕机的后果, 独立性极强。为使用后的维护提供了便捷的保障。

Results

Conclusion