Abstract

The goal of this work is to emulate a situation where an analyst with none or little previous geological knowledge of the samples must deal with an unsupervised approach to gain some insights about drill core samples and compare the results of two main unsupervised algorithms with and without filtering methods.We used in situ portable X-ray Fluorescence data acquired in sawn drill core samples of rocks from the Sabiá prospect, at the Rio Salitre greenstone belt, São Francisco Craton Brazil, for pseudo-log automatic generation through running Unsupervised Learning models to group distinct lithotypes. We tested the K-means and Model-Based Cluster (MBC) algorithms and compared their performance in the raw and filtered data with a manual macroscopic log description. From the initial 47 available elements, 20 variables were selected for modeling following the criteria of presenting at least 95% of uncensored values. Additionally, we performed a Shapiro-Wilk test that confirmed a non-parametric distribution by verifying the P-value attribute less than the 5% significance level. We also checked if the dataset's distribution was statistically equivalent to the duplicates with the assistance of a Kruskal–Wallis test, which would confirm the representativity power of the measurements at the same 5% significance level. After this step, the pseudo-log models were created based on reduced dimension data, compressed by a centered Principal Component Analysis with data rescaled by its range. Concerning reducing the high-frequency noise in the selected features, we employed an exponential weighted moving average filter with a window of five samples. By the analysis of the Average Silhouette Width on sample space, the optimum number for K-means was fixed in two, and then the first models were generated for raw and filtered data. From the MBC perspective, the sample space is interpreted as a finite mixture of groups with distinct Gaussian probability distribution. The number of clusters is defined by the analysis of the Bayesian Information Criteria (BIC), where several models are tested, and the one in the first local maximum defines the number of groups and the type of probabilistic model in the simulation. For the data used in this work, the optimum group number for MBC is four, and the probabilistic model type determined by the BIC is elliptical with equal volume, shape, and orientation. Thus, Model-Based Cluster has detected four different cluster groups with almost the same representativity for the two drill cores' samples. All K-means and MBC models were able to detect changes in lithotypes not described in the manual log. On the other hand, one lithotype described by the experts was not detected by this methodology in any attempt. It was needed a detailed investigation with thin section descriptions to determine the cause of this response. Finally, compared with the manual log description, it is notable that the models built on filtered data have better performance than those generated on raw data, and the MBC filtered model had better performance than the others. Hence, this multivariate approach allied to filtering the data with a moving average transformation can be a tool of great help during several stages of mineral exploration, either in the creation of pseudo-log models prior the description of the drill core samples or in the data validation stage, when it is necessary to standardize several descriptions made by different professionals.

Highlights

  • The "data-rich paradigm" is already a reality in mineral exploration

  • For the data used in this work, the optimum group number for Model-Based Cluster (MBC) is four, and the probabilistic model type determined by the Bayesian Information Criteria (BIC) is elliptical with equal volume, shape, and orientation

  • Compared with the manual log description, it is notable that the models built on filtered data have better performance than those generated on raw data, and the MBC filtered model had better performance than the others

Read more

Summary

Introduction

The "data-rich paradigm" is already a reality in mineral exploration. This scenario can be found in several segments of the mineral industry, such as airborne geophysics, exploratory geochemical surveys, mineral resources, and reserves analyses evaluation, studies of physical properties of the rock, mechanical assays of mine engineering, ore grade control, and environmental monitoring, among many other study branches associated with various stages in the mineral research. Some government entities (Agencies and Geological Surveys) provide valuable data for the mineral industry, increasing the available volume of information In this data-dominated scenario, fast, consistent, and reliable analysis is vital for decision making and resource investment. We present an approach to assist the drill core management with a fast, consistent, and highly reproducible methodology based on open-source code and data already acquired, which sometimes represents a challenge due to survey issues and/or the high number of geochemical features. For this purpose, we employed unsupervised Machine Learning methods in portable X-Ray Fluorescence data of drill core rock samples to generate pseudo-log.

Geological setting
Rock samples
X-Ray Fluorescence
Data management
Exponentially Weighted Moving Average Filtering
Correlation
Dimensionality Reduction
K-means
Model-Based Clustering
Results
Conclusions
Verification of Data Distribution
Representativity of measurements
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call