Comparison of features elimination methods for geomagnetic data classification

A A Gainetdinova,A V Vorobev

doi:10.15593/2499-9873/2023.4.02

Abstract

The main stages of processing and feature selection methods for their further use in machine learning algorithms for building models that are designed to predict auroras are considered. The aim of this work is to compare the methods of feature selection when constructing a model for diagnosing the presence of auroras based on the intellectual analysis of geomagnetic data. Data from the Lovozero Observatory (LOZ) for nine years (2012–2020) were used as data for processing. A distinctive feature of the data is their heterogeneity: the set contains both categorical (binary and non-binary) and quantitative data. We consider such feature selection methods as principal component analysis, support vector machines, recursive feature elimination, and the Extra-Trees algorithm. The results of the study showed that the use of selected features based on the analysis in the projection of the principal components will overcome the curse of dimensionality, eliminate noise and reduce model overfitting.

Full Text