How to effectively detect geochemical anomalies associated with mineralization is a challenging task due to the extreme class-imbalance of geochemical exploration data. To address this challenge, various machine learning techniques have been employed to detect geochemical anomalies associated with mineralization. However, almost all of these machine learning techniques have their limitations when it comes to modeling class-imbalanced geochemical exploration data. To establish efficient robust high-performance models for detecting geochemical anomalies associated mineralization, a case study was carried out in the Helong area, Jilin Province, China. Decision tree (C4.5) classifiers were used as the base or weak classifiers, four imbalanced learning ensemble models, including self-paced ensemble model, under-bagging ensemble model, synthetic minority oversampling technique (SMOTE)-boost ensemble model and random under sampling (RUS)-boost ensemble model were established and compared in the detection of polymetallic mineralization anomalies from the 1:50,000 stream sediment survey data. The output of the base classifier is a vector consisting of the probability that the sample belongs to a mineralization anomaly and the probability that the sample belongs to the background. The performance of each ensemble model was evaluated using the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC). The ROC curves of the four models are close to the upper left corner of the ROC space. The AUC values of the four models are not lower than 0.9589. The polymetallic mineralization anomalies detected by the four models account for less than 7 % of the entire study area, but contain not less than 93 % of the polymetallic deposits found in the study area. In addition, the polymetallic mineralization anomalies detected by the four models spatially coincide with the regional controlling factors of polymetallic mineralization in the study area. To summarize, the four models have very high performance in detecting polymetallic mineralization anomalies, and the polymetallic mineralization anomaly detection results are consistent with the regional geological and metallogenic characteristics in the study area. Therefore, the imbalanced learning ensemble techniques are powerful tools for the establishment of high-performance ensemble classification models for detecting mineralization anomalies from geochemical exploration data.
Read full abstract