Holistic evaluation in multi-model databases benchmarking

Chao Zhang,Jiaheng Lu

doi:10.1007/s10619-019-07279-6

Chao Zhang, Jiaheng Lu

Open Access

https://doi.org/10.1007/s10619-019-07279-6

Copy DOI

Abstract

A multi-model database (MMDB) is designed to support multiple data models against a single, integrated back-end. Examples of data models include document, graph, relational, and key-value. As more and more platforms are developed to deal with multi-model data, it has become crucial to establish a benchmark for evaluating the performance and usability of MMDBs. In this paper, we propose UniBench, a generic multi-model benchmark for a holistic evaluation of state-of-the-art MMDBs. UniBench consists of a set of mixed data models that mimics a social commerce application, which covers data models including JSON, XML, key-value, tabular, and graph. We propose a three-phase framework to simulate the real-life distributions and develop a multi-model data generator to produce the benchmarking data. Furthermore, in order to generate a comprehensive and unbiased query set, we develop an efficient algorithm to solve a new problem called multi-model parameter curation to judiciously control the query selectivity on diverse models. Finally, the extensive experiments based on the proposed benchmark were performed on four representatives of MMDBs: ArangoDB, OrientDB, AgensGraph and Spark SQL. We provide a comprehensive analysis with respect to internal data representations, multi-model query and transaction processing, and performance results for distributed execution.

Highlights

Multi-model dataBase (MMDB) is an emerging trend for the database management system [27,28], which utilizes a single platform to manage data stored in different models, such as document, graph, relational, and key-value
The results show that AgensGraph and ArangoDB are better at the write-heavy transaction (New Payment) and OrientDB is more efficient in performing the read-heavy transaction (New order)
Benchmarking multi-model databases is a challenging task since current public data and workloads can not well match various cases of applications

Summary

Introduction

Multi-model dataBase (MMDB) is an emerging trend for the database management system [27,28], which utilizes a single platform to manage data stored in different models, such as document, graph, relational, and key-value. Suppose a recommendation query for online users: Given a customer and a product category, find this customer’s friends within 3-hop friendship who have bought products in the given category, return the feedback with the 5-rating reviews This query involves three data models: customer with 3-hop friends (Graph), order embedded with an item list (JSON), and customer’s feedback (Key-value). There emerge many native multimodel databases, to name a few, ArangoDB [11], AgensGraph [10], OrientDB [33] These native systems utilize a single store to manage the multi-model data along with a unified query language.

Objectives

Findings

Discussion

Conclusion