Load-Balanced Cluster for Scale-Out Storage of Knowledge

Zheng Xiong,Guocheng Zhu,Wei Yu,Zhihong Chong,Sen Wang

doi:10.1109/cbd.2018.00010

Abstract

In the face of massive data, Knowledge Graph (KG) needs the scale-out storage schema and distributed parallel query engine to guarantee its storage and query performance. In this paper, we propose a Knowledge Graph Storage Access System (KGSAS) based on HBase to deal with these problems. Our approach presents a scalable storage schema which uses random prefix and the pre-partition operation to ensure load-balanced entity storage. Besides presenting the storage schema, in order to improve query efficiency, we propose two distributed parallel query engines: HBase with Spark and HBase with Coprocessor. The HBase with Spark engine accelerate queries in parallel by using the memory calculation on Spark. The HBase with Coprocessor engine utilizes inverted index and Coprocessor technology to speed up queries by scanning cluster in parallel. The evaluation results show that HBase with Coprocessor engine has the better performance for querying KG.

Full Text