Abstract
Daily trajectory data scale of vehicle monitoring networks in smart cities is growing rapidly, reaching daily volumes of 1 billion. Accessing hyper massive spatiotemporal trajectory data (HMSTD) in transport, the Internet of Things, or other fields is difficult and limited based on the current spatiotemporal data index techniques. Therefore, we propose path-divided Hadoop Distributed File System (HDFS) data blocking (PDDB) based on the Apache Impala (PDDB-Impala) method to optimize the efficient access manner of HMSTD to enhance the efficiency of hyper data sharing. Moreover, PDDB parquet data partitioning rules are proposed. In experiments, 35,809 buses equipped with BD positioning sensors, creating 1.03 billion data records each day. The bus distribution in Shenzhen city is collected from 7:00 a.m. to 9:00 a.m. and 11:00 a.m. to 01:00 p.m. Moreover, PDDB-Impala achieves about 8 times, 9 times, 29 times, and 110 times higher performances than those in MongoDB or HBase for data scales of 1 billion, 10 billion, 50 billion, and 100 billion, the results of which outperform those of the equipartition in the Impala, MongoDB, and HBase methods.
Highlights
Harrison et al [1] in International Business Machines Corporation states that a ‘‘smart city’’ signifies an ‘‘instrumented, interconnected and intelligent city.’’ ‘‘Instrumented’’ means the capability of capturing and integrating real life data through sensors, personal devices, appliances, and other similar perception devices. ‘‘Interconnected’’ refers to the integration of these perceptual data into a network computing platform that facilitates the exchange of heterogeneous data among the heterogeneous web services. ‘‘Intelligent’’ means the combination of complex analytics, spatiotemporal data modelling, mining, association, and visualization to make better intelligent decisions
The data scale of the dataset created each day in the Tokyo taxi autopilot network is approximately 8.74 billion; the data scale in the Beijing taxi network is approximately 1.93 billion; the data scale in the Shenzhen bus network is approximately 1.03 billion; the data scale in the New York taxi network is approximately 984 million
WORK Confronted with the ever-increasing scale of data sets reaching as scales as large as 100 billion records, traditional relational database or NoSQL methods can encounter dramatic performance degradation and may suffer limited scalability in terms of hyper massive spatiotemporal data
Summary
NParquet RAMs Degreemean longitudei+1 latitudei+1 PlongitudeR STtn arrayroads blocktn_spatial Pbegin partitiontemporal.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.