Research on a Distributed Processing Model Based on Kafka for Large-Scale Seismic Waveform Data

Xu-Chao Chai,Dan-Ning Wang,Wen-Qing Wang,Qing-Liang Wang,Wen-Sheng Chen,Yue Li

doi:10.1109/access.2020.2976660

Xu-Chao Chai, Dan-Ning Wang + Show 4 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.2976660

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 11	License type: CC BY 4.0

Affiliation: Northwestern Polytechnical University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

For storage and recovery requirements on large-scale seismic waveform data of the National Earthquake Data Backup Center (NEDBC), a distributed cluster processing model based on Kafka message queues is designed to optimize the inbound efficiency of seismic waveform data stored in HBase at NEDBC. Firstly, compare the characteristics of big data storage architectures with that of traditional disk array storage architectures. Secondly, realize seismic waveform data analysis and periodic truncation, and write HBase in NoSQL record form through Spark Streaming cluster. Finally, compare and test the read/write performance of the data processing process of the proposed big data platform with that of traditional storage architectures. Results show that the seismic waveform data processing architecture based on Kafka designed and implemented in this paper has a higher read/write speed than the traditional architecture on the basis of the redundancy capability of NEDBC data backup, which verifies the validity and practicability of the proposed approach.

Highlights

Seismic observation waveform data comes from the acquisition and forwarding of seismic sensors at stations of seismic networks, and is gradually aggregated to form real-time stream data by Jopens software [1], [2]
We implement Key-Value parse for miniSEED [16] seismic waveform data: (1) Periodic truncation and msgKeymsg [15] parse of seismic data via C language are realized; (2) Parse the msgKey-msg format into the Key-Value format and conduct HBase inbound. (3) in order to further improve the efficiency of writing seismic waveform data into HBase, compared with the current storage solutions in the seismic industry [10], we propose the data inbound process based on Kafka production-consumption model [13] and Spark Streaming real-time computing framework for seismic waveform data, achieve the purpose of the cluster processing seismic waveform data concurrently, and effectively increase the speed of concurrently writing seismic data into HBase
KAFKA-BASED SEISMIC WAVEFORM DATA PROCESSING MODEL AND HBASE READ/WRITE IMPLEMENTATION To write seismic data in the Key-Value format into HBase database at a high speed, we introduces the Kafka message queue model under big data processing architecture, and designs the seismic data inbound scheme based on Kafka production consumption model

Summary

INTRODUCTION

Seismic observation waveform data comes from the acquisition and forwarding of seismic sensors at stations of seismic networks, and is gradually aggregated to form real-time stream data by Jopens software [1], [2]. The data processing architecture analyzes seismic waveform data from traditional disk arrays or real-time streaming data into Key-Value format and writes them to HBase distributed databases, solving the speed bottleneck problem on writing seismic waveform data into a distributed database, improving analysis, storage, and query efficiency of such data. (3) in order to further improve the efficiency of writing seismic waveform data into HBase, compared with the current storage solutions in the seismic industry [10], we propose the data inbound process based on Kafka production-consumption model [13] and Spark Streaming real-time computing framework for seismic waveform data, achieve the purpose of the cluster processing seismic waveform data concurrently, and effectively increase the speed of concurrently writing seismic data into HBase.

SEISMIC DATA STORAGE COMPARISIONS UNDER DIFFERENT ARCHITECTURE

CONCLUSION