Abstract

Seismology is a data rich and data-driven science. Application of machine learning for gaining new insights from seismic data is a rapidly evolving sub-field of seismology. The availability of a large amount of seismic data and computational resources, together with the development of advanced techniques can foster more robust models and algorithms to process and analyze seismic signals. Known examples or labeled data sets, are the essential requisite for building supervised models. Seismology has labeled data, but the reliability of those labels is highly variable, and the lack of high-quality labeled data sets to serve as ground truth as well as the lack of standard benchmarks are obstacles to more rapid progress. In this paper we present a high-quality, large-scale, and global data set of local earthquake and non-earthquake signals recorded by seismic instruments. The data set in its current state contains two categories: (1) local earthquake waveforms (recorded at “local” distances within 350 km of earthquakes) and (2) seismic noise waveforms that are free of earthquake signals. Together these data comprise ~1.2 million time series or more than 19,000 hours of seismic signal recordings. Constructing such a large-scale database with reliable labels is a challenging task. Here, we present the properties of the data set, describe the data collection, quality control procedures, and processing steps we undertook to insure accurate labeling, and discuss potential applications. We hope that the scale and accuracy of STEAD presents new and unparalleled opportunities to researchers in the seismological community and beyond.

Highlights

  • Earthquakes are sudden movements across faults that release elastic energy stored in rocks and radiate seismic waves that travel throughout Earth

  • We introduce STEAD, the first high-quality largescale global data set of earthquake and non-earthquake signals recorded by seismic instruments

  • The snr can be used to distinguish data with one or two faulty channels or to select highquality waveforms for tasks that are sensitive to the waveform quality

Read more

Summary

INTRODUCTION

PROPERTIES OF THE DATA SET STEAD includes two main classes of earthquake and nonearthquake signals recorded by seismic instruments At this stage the earthquake class contains only one category of local-earthquakes with about 1,050,000 three-component seismograms (each 1 minute long) associated with ∼ 450,000 earthquakes (Fig. 3) that occurred between January 1984 and August 2018. To ensure that each waveform only includes one earthquake signal (with known parameters) and to prevent inclusion of unknown (non-cataloged) earthquake signals, we used a short, fixed window (1 minute) around the phase arrival times at different stations to request data Each window contains both P and S waves and begins from 5 to 10 seconds prior to the P arrival and ends at least 5 second after the S arrival. The snr can be used to distinguish data with one or two faulty channels (where some of the components are mainly noise but earthquake signal can still be observed on a remaining component) or to select highquality waveforms for tasks that are sensitive to the waveform quality

ERRORS
STEAD APPLICATIONS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call