Abstract

Machine learning (ML) is becoming an appealing tool in various fields of Earth Sciences, especially in mineral prospectivity mapping (MPM) to support mineral exploration. ML algorithms are designed to assume a relatively balanced amount of training data for the estimation of the decision boundaries between the classes of interest (i.e., in MPM: mineralized- and non-mineralized locations). However, in MPM the numbers of mineralized and non-mineralized locations are naturally imbalanced, as the number of known mineral deposit occurrences (as a proxy of mineralized or positive class) are naturally much smaller than the number of non-mineralized locations (the negative class). The use of imbalanced data leads to difficulties in the training of ML models for MPM, due to the learning bias towards the features of the predominant (i.e., negative) class. In the present study, using support vector machine for Cu-Au prospectivity modeling in the Carajás mineral province (Brazil), we evaluated the effects of Synthetic Minority Over-sampling Technique (SMOTE), which addresses the issue of imbalanced training data on the performance of MPM. The original training data for the positive (i.e., minority) class was modified by over-sampling the mineralized locations using SMOTE and by randomly under-sampling the non-mineralized locations at different proportions, producing 400 training datasets with proportions of mineralized-to-non-mineralized samples ranging from 600:30 to 30:600. Each of these individual training datasets was used to evaluate the performance of MPM under different proportions of mineralized-to-non-mineralized samples. The performance of each prospectivity model was objectively evaluated using the F1 score and the success-rate curve. The results show that SMOTE can significantly increase the performance and the spatial efficiency of MPM. The main differences between the performances of the derived prospectivity models illustrate the sensitivity of MPM to the number of samples and distribution of classes in the training data. According to the results, better performance is achieved using SMOTE when the prospectivity models are trained with an equal number of mineralized and non-mineralized samples. The best prospectivity model trained with a modified dataset with 600:600 proportion of mineralized to non-mineralized samples resulted in 100% classification of the training mineralized locations and almost 80% of the testing mineralized locations, and outlined only 7% of the study area as prospective.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call