QuakeLabeler: A Fast Seismic Data Set Creation and Annotation Toolbox for AI Applications

Hao Mai,Pascal Audet

doi:10.1785/0220210290

Abstract

Abstract The production and preparation of data sets are essential steps in machine learning (ML) applications. With the increasing volume and scale of available ML techniques in seismology, annotating seismograms or seismic features has become time consuming and tedious for many researchers. Furthermore, most methods train and validate on unique data subsets, which hampers independent performance evaluation and comparison. To address this problem, we have developed the software QuakeLabeler, an open-source Python package to customize, build, and manage earthquake training data sets, including processing and visualization. QuakeLabeler has tight pipeline functions, which include retrieving seismograms from multiple online data centers, querying online human-reviewed catalogs, signal processing, annotating (labeling), and analyzing data distribution. In addition, relevant statistical graphics and human-readable output files can be generated. Various file export formats are supported, such as Seismic Analysis Code (*.sac), mini Standard for Exchange of Earthquake Data (*.mseed), NumPy (*.npz), MATLAB (*.mat), and the Hierarchical Data Format version 5 (*.hdf5). This toolbox is packaged with an interactive command-line interface. Three alternative running modes (beginner, advanced, and benchmark) are implemented, intended to offer specific data set solutions for different types of applications, that is, quick-start recipes for simple ML solutions, advanced design for customized project training, and benchmark bulletins for model comparison.

Full Text