Comparative Analysis of Random Forest and XGBoost in Classifying Ionospheric Signal Disturbances During Solar Flares

Filip Arnaut,Aleksandra Kolarski,Vladimir Srećković

doi:10.5194/egusphere-egu24-2046

Abstract

In our previous publication (Arnaut et al. 2023), we demonstrated the application of the Random Forest (RF) algorithm for classifying disturbances associated with solar flares (SF), erroneous signals, and measurement errors in VLF amplitude data i.e., anomaly detection in VLF amplitude data. The RF algorithm is widely regarded as a preferred option for conducting research in novel domains. Its advantages, such as its ability to avoid overfitting data and its simplicity, make it particularly valuable in these situations. Nevertheless, it is imperative to conduct thorough testing and evaluation of alternative algorithms and methods to ascertain their potential advantages and enhance the overall efficiency of the method. This brief communication demonstrates the application of the XGBoost (XGB) method on the exact dataset previously used for the RF algorithm, along with a comparative analysis between the two algorithms. Given that the problem is framed as a machine learning (ML) problem with a focus on the minority class, the comparative analysis is exclusively conducted using the minority (anomalous) data class. The data pre-processing methodology can be found in Arnaut et al. (2023). The XGB tuning process involved using a grid search method to optimize the hyperparameters of the model. The number of estimators (trees) was varied from 25 to 500 in increments of 25, and the learning rate was varied from 0.02 to 0.4 in increments of 0.02. The F1-Score for the anomalous data class is similar for both models, with a value of 0.508 for the RF model and 0.51 for the XGB model. These scores were calculated using the entire test dataset, which consists of 19 transmitter-receiver pairs. Upon closer examination, it becomes evident that the RF model exhibits a higher precision metric (0.488) than the XGB model (0.37), while the XGB model demonstrates a higher recall metric (0.84) compared to the RF model (0.53). Upon examining each individual transmitter-receiver pair, it was found that XGB outperformed RF in terms of F1-Scores in 10 out of 19 cases. The most significant disparities are observed in cases where the XGB model outperformed by a margin of 0.15 in terms of F1-Score, but conversely performed worse by approximately -0.16 in another instance for the anomalous data class. The XGB models outperformed the RF model by approximately 6.72% in terms of the F1-score for the anomalous data class when averaging all the 19 transmitter-receiver pairs. When utilizing a point-based evaluation metric that assigns rewards or penalties for each entry in the confusion matrix, the RF model demonstrates an overall improvement of approximately 5% compared to the XGB model. Overall, the comparison between the RF and XGB models is ambiguous. Both models have instances where one is superior to the other. Further research is necessary to fully optimize the method, which has benefits in automatically classifying VLF amplitude anomalous signals caused by SF effects, erroneous measurements, and other factors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparative Analysis of Random Forest and XGBoost in Classifying Ionospheric Signal Disturbances During Solar Flares

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Ensemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping
Jiangning Yin ... Nan Li
Ore Geology Reviews | VOL. 145
Jiangning Yin, et. al.Jiangning Yin ... Nan Li
28 Apr 2022
Ore Geology Reviews | VOL. 145

Bayesian optimization based random forest and extreme gradient boosting for the pavement density prediction in GPR detection
Yifang Chen ... Yijie Su
Construction and Building Materials | VOL. 387
Yifang Chen, et. al.Yifang Chen ... Yijie Su
09 May 2023
Construction and Building Materials | VOL. 387

Detection of the storage time of light bruises in yellow peaches based on spectrum and texture features of hyperspectral image
Bin Li ... Ji‐Ping Zou
Journal of Chemometrics | VOL. 37
Bin Li, et. al.Bin Li ... Ji‐Ping Zou
14 Sep 2023
Journal of Chemometrics | VOL. 37

The development and validation of a non-invasive prediction model of hyperuricemia based on modifiable risk factors: baseline findings of a health examination population cohort.
Shuo Chen ... Linrun Kong
Food & Function | VOL. 14
Shuo Chen, et. al.Shuo Chen ... Linrun Kong
01 Jan 2023
Food & Function | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative Analysis of Random Forest and XGBoost in Classifying Ionospheric Signal Disturbances During Solar Flares

Abstract

Talk to us

Similar Papers