Abstract
For storage and recovery requirements on large-scale seismic waveform data of the National Earthquake Data Backup Center (NEDBC), a distributed cluster processing model based on Kafka message queues is designed to optimize the inbound efficiency of seismic waveform data stored in HBase at NEDBC. Firstly, compare the characteristics of big data storage architectures with that of traditional disk array storage architectures. Secondly, realize seismic waveform data analysis and periodic truncation, and write HBase in NoSQL record form through Spark Streaming cluster. Finally, compare and test the read/write performance of the data processing process of the proposed big data platform with that of traditional storage architectures. Results show that the seismic waveform data processing architecture based on Kafka designed and implemented in this paper has a higher read/write speed than the traditional architecture on the basis of the redundancy capability of NEDBC data backup, which verifies the validity and practicability of the proposed approach.
Highlights
Seismic observation waveform data comes from the acquisition and forwarding of seismic sensors at stations of seismic networks, and is gradually aggregated to form real-time stream data by Jopens software [1], [2]
We implement Key-Value parse for miniSEED [16] seismic waveform data: (1) Periodic truncation and msgKeymsg [15] parse of seismic data via C language are realized; (2) Parse the msgKey-msg format into the Key-Value format and conduct HBase inbound. (3) in order to further improve the efficiency of writing seismic waveform data into HBase, compared with the current storage solutions in the seismic industry [10], we propose the data inbound process based on Kafka production-consumption model [13] and Spark Streaming real-time computing framework for seismic waveform data, achieve the purpose of the cluster processing seismic waveform data concurrently, and effectively increase the speed of concurrently writing seismic data into HBase
KAFKA-BASED SEISMIC WAVEFORM DATA PROCESSING MODEL AND HBASE READ/WRITE IMPLEMENTATION To write seismic data in the Key-Value format into HBase database at a high speed, we introduces the Kafka message queue model under big data processing architecture, and designs the seismic data inbound scheme based on Kafka production consumption model
Summary
Seismic observation waveform data comes from the acquisition and forwarding of seismic sensors at stations of seismic networks, and is gradually aggregated to form real-time stream data by Jopens software [1], [2]. The data processing architecture analyzes seismic waveform data from traditional disk arrays or real-time streaming data into Key-Value format and writes them to HBase distributed databases, solving the speed bottleneck problem on writing seismic waveform data into a distributed database, improving analysis, storage, and query efficiency of such data. (3) in order to further improve the efficiency of writing seismic waveform data into HBase, compared with the current storage solutions in the seismic industry [10], we propose the data inbound process based on Kafka production-consumption model [13] and Spark Streaming real-time computing framework for seismic waveform data, achieve the purpose of the cluster processing seismic waveform data concurrently, and effectively increase the speed of concurrently writing seismic data into HBase.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have