Self-paced ensemble for constructing an efficient robust high-performance classification model for detecting mineralization anomalies from geochemical exploration data

Yongliang Chen,Xudong Du,Min Guo

doi:10.1016/j.oregeorev.2023.105418

Yongliang Chen, Xudong Du + Show 1 more

Open Access

https://doi.org/10.1016/j.oregeorev.2023.105418

Copy DOI

Journal: Ore Geology Reviews	Publication Date: Mar 30, 2023
Citations: 4	License type: cc-by-nc-nd

Affiliation: Jilin University

Abstract

Given a base classifier such as the support vector classifier, the self-training algorithm can be used to build a high-performance classification model to detect mineralization anomalies from geochemical exploration data. However, the established classification model has poor robustness. To solve this problem, the self-paced ensemble algorithm was adopted to establish an efficient robust high-performance model for detecting mineralization geochemical anomalies. The self-paced ensemble algorithm can efficiently build a robust classification model based on a base classifier such as the support vector classifier, decision tree classifier, k-nearest neighbor classifier, gradient boosting classifier and multilayer perceptron. To illustrate the superiority of the self-paced ensemble algorithm, a case study for molybdenum mineralization anomaly detection was carried out in the Molidawa area, Inner Mongolia, China. The self-paced ensemble algorithm and self-training algorithm were used to build classification models based on decision tree classifier to detect molybdenum mineralization anomalies from stream sediment survey data. Each algorithm was repeated five times, each time using the same set of parameters to initialize the algorithm. Thus, five classification models were established for each algorithm. The precision-recall curve (PRC) and area under the precision-recall curve (AUPRC) were used to evaluate the performance of the classification models in geochemical exploration. The results show that compared with the PRCs of the five classification models established by the self-training algorithm, the PRCs of those established by the self-paced ensemble algorithm coincide with each other and are closer to the upper right corner of the precision-recall space. The AUPRCs of the five classification models established by the self-paced ensemble algorithm are all 0.3538, and the AUPRCs of those established by the self-training algorithm are between 0.006624 and 0.02941, much lower than the value 0.3538. In addition, the time of the self-paced ensemble algorithm for geochemical data modeling ranges from 44.84 to 47.05 s, and the time of the self-training algorithm for geochemical data modeling ranges from 51.97 to 59.46 s. Therefore, the data-modeling efficiency, robustness and classification performance of the model established by the self-paced ensemble algorithm are better than those of the model established by the self-training algorithm in detecting molybdenum mineralization anomalies. It can be concluded that the self-paced ensemble algorithm is one of effective tools to establish an efficient robust high-performance classification model for mineralization anomaly detection.

Full Text