Abstract

In this article, we considered the problem of M≥3 earthquake (EQ) forecasting (hindcasting) using a machine learning (ML) approach, using experimental (training) time series on monitoring water-level variations in deep wells as well as geomagnetic and tidal time series in Georgia (Caucasus). For such magnitudes’, the number of “seismic” to “aseismic” days in Georgia is approximately 1:5 and the dataset is close to the balanced one. However, the problem of forecast is practically important for stronger events—say, events of M≥3.5—which means that the learning dataset of Georgia became more imbalanced: the ratio of seismic to aseismic days for in Georgia reaches the values of the order of 1:20 and more. In this case, some accepted ML classification measures, such as accuracy leads to wrong predictions due to a large number of true negative cases. As a result, the minority class, here—seismically active periods—is ignored at all. We applied specific measures to avoid the imbalance effect and exclude the overfitting possibility. After regularization (balancing) of the training data, we build the confusion matrix and performed receiver operating classification in order to forecast the next day probability of M≥3.5 earthquake occurrence. We found that the Matthews’ correlation coefficient (MCC) is the measure, which gives good results even if the negative and positive classes are of very different sizes. Application of MCC to observed geophysical data gives a good forecast of the next day M≥3.5 seismic event probability of the order of 0.8. After randomization of EQ dates in the training dataset, the Matthews’ coefficient efficiency decreases to 0.17.

Highlights

  • In this article (Chelidze et al, 2020), we considered the problem of earthquake (EQ) forecast using a machine learning (ML) approach, namely, the package ADAM (Kingma and Ba 2014), based on experimental data on monitoring water-level variations in deep wells as well as geomagnetic and tidal time series in Georgia (Caucasus)

  • The results of testing, which we present as the confusion matrix and the receiver operating classification (ROC) graph (Figure 6), show that the applied methodology leads to quite satisfactory result: according to the confusion matrix, ML success cases amount to 12 from total 14 events with two false forecasts

  • The ratio of seismic to aseismic days for in Georgia reaches the values of the order of 1:20 and more, which mean that the dataset is significantly imbalanced

Read more

Summary

Introduction

In this article (Chelidze et al, 2020), we considered the problem of earthquake (EQ) forecast using a machine learning (ML) approach, namely, the package ADAM (Kingma and Ba 2014), based on experimental (training) data on monitoring water-level variations in deep wells as well as geomagnetic and tidal time series in Georgia (Caucasus). In the 2020 article, we used the low EQ threshold value of magnitude M ≥ 3 as a forecast object. In this case, the number of days with EQs of magnitude larger than 3 is less, but of the same order as the number of “aseismic” days, and forecast for EQs of M ≥ 3 can be considered as the problem of the so-called slightly imbalanced sets; namely, the ratio of “seismic” to “aseismic” days in Georgia is approximately 1:5 for this magnitude range. We apply the ML methodology taking into account the larger magnitude threshold and stronger imbalance in the data When we put the problem of the forecast in this way, the learning datasets became more imbalanced and, for example, the ratio of seismic to aseismic days for in Georgia reaches values between 1:18 and 1:26, which means that the effect of imbalance should be taken into account.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call