Abstract

As an emerging topic, scholarly big data is the vast quantity of research output that requires sophisticated platforms and tools for creating applications that can benefit the research community. This paper addresses the applied research in storing, indexing, and querying scholarly big data. The relational databases, which employ a pre-defined and well-partitioned data model are not flexible, while the NoSQL databases lack sophisticated index and partition mechanisms. The proposed FacetsBase, which is a Hadoop-based key-value data store, combines the performance advantages of a relational database, the flexibility of a NoSQL database and the parallelism of a distributed file system. It partitions and indexes the publication information using the concept of facets, it stores facetsin a multi-dimensional logical data model and lower-cost file format, and it provides the attribute-specified query and attribute-unspecific query. In experiments, FacetsBase was compared with Hive, HBase, MongoDB, and Cassandra in terms of query performance. The results indicate that FacetsBase performs 1.4x, 3.8x, 1.4x, and 2.9x faster on average, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call