Abstract

As an emerging topic, scholarly big data is the vast quantity of research output that requires sophisticated platforms and tools for creating applications that can benefit the research community. This paper addresses the applied research in storing, indexing, and querying scholarly big data. The relational databases, which employ a pre-defined and well-partitioned data model are not flexible, while the NoSQL databases lack sophisticated index and partition mechanisms. The proposed FacetsBase, which is a Hadoop-based key-value data store, combines the performance advantages of a relational database, the flexibility of a NoSQL database and the parallelism of a distributed file system. It partitions and indexes the publication information using the concept of facets, it stores facetsin a multi-dimensional logical data model and lower-cost file format, and it provides the attribute-specified query and attribute-unspecific query. In experiments, FacetsBase was compared with Hive, HBase, MongoDB, and Cassandra in terms of query performance. The results indicate that FacetsBase performs 1.4x, 3.8x, 1.4x, and 2.9x faster on average, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.