FacetsBase: A Key-Value Store Optimized for Querying on Scholarly Data

Jie Song,Yuanguo Bi,Tiantian Li,Guangjie Han

doi:10.1109/tetc.2018.2844313

Abstract

As an emerging topic, scholarly big data is the vast quantity of research output that requires sophisticated platforms and tools for creating applications that can benefit the research community. This paper addresses the applied research in storing, indexing, and querying scholarly big data. The relational databases, which employ a pre-defined and well-partitioned data model are not flexible, while the NoSQL databases lack sophisticated index and partition mechanisms. The proposed FacetsBase, which is a Hadoop-based key-value data store, combines the performance advantages of a relational database, the flexibility of a NoSQL database and the parallelism of a distributed file system. It partitions and indexes the publication information using the concept of facets, it stores facetsin a multi-dimensional logical data model and lower-cost file format, and it provides the attribute-specified query and attribute-unspecific query. In experiments, FacetsBase was compared with Hive, HBase, MongoDB, and Cassandra in terms of query performance. The results indicate that FacetsBase performs 1.4x, 3.8x, 1.4x, and 2.9x faster on average, respectively.

Full Text