Abstract

Data processing can be roughly divided into two categories, online transaction processing OLTP(on-line transaction processing) and online analytical processing OLAP(on-line analytical processing). OLTP is the main application of traditional relational databases, and it is some basic daily transaction processing, such as bank pipeline transactions and so on. OLAP is the main application of the data warehouse system, it supports some more complex data analysis operations, focuses on decision support, and provides popular and intuitive analysis results. As the amount of data processed by enterprises continues to increase, distributed databases have gradually replaced stand-alone databases and become the mainstream of applications. However, the current business supported by distributed databases is mainly based on OLTP applications, lacking OLAP implementation. This paper proposes an implementation method of HTAP for distributed database CBase, which provides an implementation method of OLAP analysis for CBase, and can easily deal with data analysis of large amounts of data.

Highlights

  • Data processing can be roughly divided into two categories, online transaction processing OLTP( on⁃line transaction processing) and online analytical processing OLAP

  • OLTP is the main application of traditional relational databases, and it is some basic daily transaction processing, such as bank pipe⁃ line transactions and so on

  • OLAP is the main application of the data warehouse system, it supports some more com⁃ plex data analysis operations, focuses on decision support, and provides popular and intuitive analysis results

Read more

Summary

Introduction

为此, 本文提出了一种基于分布式数据库 CBase 的 HTAP 方案,在 CBase 和 Spark 之间建立一 个适配层,将 CBase 与 Spark 分析引擎结合起来,实 现 OLTP 与 OLAP 功能,并对适配层的数据传输进 行优化,提高 AP 分析效率。 由于在同一集群中同时进行了 OLTP 与 OLAP 工作会对性能造成影响,因此,出现了许多集成第三 方平台 的 HTAP 数 据 库, 通 过 集 成 Storm、 Flink 及 Spark 等平台来实现 OLAP 分析。 本文对分布式数据库的 HTAP 设计更简便且容 易实现,省去了一系列嵌入及维护工作,直接在下层 通过适配层连接 Spark 节点,通过适配层将所有缓 存数据传输至各个 Spark 节点,即可进行 AP 分析。 本文设计方案的实现弥补了当前 CBase 对 OLAP 分 析功能的缺失,并且为所有 Spark 节点的数据存储 提供了空间。 同时,各个组件之间是相互独立的,适 配层与最 下 层 的 CBase 相 互 剥 离, 且与最上层的 Spark 分析 API 相互剥离,并不会由于某个组件的 版本更新或者换代后,发生无法识别或宕机的后果, 独立性极强。 为使用后的维护提供了便捷的保障。

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call