Abstract
Faced with the rapid growth of vector data and the urgent requirement of low-latency query, it has become an important and timely challenge to effectively achieve the scalable storage and efficient access of vector big data. However, a systematic method is rarely seen for vector polygon data storage and query taking spatial locality into account in the storage schema, index construction and query optimization. In the paper, we focus on the storage and topological query of vector polygon geometry data in HBase, and the rowkey in the HBase table is the concatenation of the Hilbert value of the grid cell to which the center of the object entity’s MBR belongs, the layer identifier and the order code. Then, a new multi-level grid index structure, termed Q-HBML, that incorporates the grid-object spatial relationship and a new Hilbert hierarchical code into the multi-level grid, is proposed for improving the spatial query efficiency. Finally, based on the Q-HBML index, two query optimization strategies and an optimized topological query algorithm, ML-OTQ, are presented to optimize the topological query process and enhance the topological query efficiency. Through four groups of comparative experiments, it has been proven that our approach supports better performance.
Highlights
The advancement of spatial information acquisition technologies and the proliferation of geographical information applications result in an explosive growth of spatial vector data.The traditional centralized data management technologies are facing the problems of high-concurrent read-write and the scalability problem while dealing with massive and complex spatial vector data
Are we concerned about the design of the rowkey, but we propose a new index structure and query optimization strategies to improve query efficiency
We mainly focus on data storage topological query efficiency of large scales of polygon data
Summary
The traditional centralized data management technologies are facing the problems of high-concurrent read-write and the scalability problem while dealing with massive and complex spatial vector data. How to achieve scalable storage and fast query at low costs is an urgent and challenging issue. Cloud computing is a new distributed computing and storage architecture that supports a massive horizontal extension on low-cost computers, which provides infinite and scalable storage and computing power. Applying cloud computing to the geographic information system (GIS) is an effective way to solve massive spatial vector data issues [1,2,3,4]. HBase [5] is a highly scalable, high-concurrency, high-reliability and fault-tolerant distributed column-oriented NoSQL database, built on top of the Hadoop Distributed File System (HDFS) [6], providing powerful storage capacity and offering low-latency query service.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.