Unsupervised drill core pseudo-log generation in raw and filtered data, a case study in the Rio Salitre greenstone belt, São Francisco Craton, Brazil

Guilherme Ferreira Da Silva,João Henrique Larizzatti,Anderson Dourado Rodrigues Da Silva,Carina Graciniana Lopes,Evandro Luiz Klein,Kotaro Uchigasaki

doi:10.1016/j.gexplo.2021.106885

Guilherme Ferreira Da Silva, João Henrique Larizzatti + Show 4 more

Open Access

https://doi.org/10.1016/j.gexplo.2021.106885

Copy DOI

Abstract

The goal of this work is to emulate a situation where an analyst with none or little previous geological knowledge of the samples must deal with an unsupervised approach to gain some insights about drill core samples and compare the results of two main unsupervised algorithms with and without filtering methods.We used in situ portable X-ray Fluorescence data acquired in sawn drill core samples of rocks from the Sabiá prospect, at the Rio Salitre greenstone belt, São Francisco Craton Brazil, for pseudo-log automatic generation through running Unsupervised Learning models to group distinct lithotypes. We tested the K-means and Model-Based Cluster (MBC) algorithms and compared their performance in the raw and filtered data with a manual macroscopic log description. From the initial 47 available elements, 20 variables were selected for modeling following the criteria of presenting at least 95% of uncensored values. Additionally, we performed a Shapiro-Wilk test that confirmed a non-parametric distribution by verifying the P-value attribute less than the 5% significance level. We also checked if the dataset's distribution was statistically equivalent to the duplicates with the assistance of a Kruskal–Wallis test, which would confirm the representativity power of the measurements at the same 5% significance level. After this step, the pseudo-log models were created based on reduced dimension data, compressed by a centered Principal Component Analysis with data rescaled by its range. Concerning reducing the high-frequency noise in the selected features, we employed an exponential weighted moving average filter with a window of five samples. By the analysis of the Average Silhouette Width on sample space, the optimum number for K-means was fixed in two, and then the first models were generated for raw and filtered data. From the MBC perspective, the sample space is interpreted as a finite mixture of groups with distinct Gaussian probability distribution. The number of clusters is defined by the analysis of the Bayesian Information Criteria (BIC), where several models are tested, and the one in the first local maximum defines the number of groups and the type of probabilistic model in the simulation. For the data used in this work, the optimum group number for MBC is four, and the probabilistic model type determined by the BIC is elliptical with equal volume, shape, and orientation. Thus, Model-Based Cluster has detected four different cluster groups with almost the same representativity for the two drill cores' samples. All K-means and MBC models were able to detect changes in lithotypes not described in the manual log. On the other hand, one lithotype described by the experts was not detected by this methodology in any attempt. It was needed a detailed investigation with thin section descriptions to determine the cause of this response. Finally, compared with the manual log description, it is notable that the models built on filtered data have better performance than those generated on raw data, and the MBC filtered model had better performance than the others. Hence, this multivariate approach allied to filtering the data with a moving average transformation can be a tool of great help during several stages of mineral exploration, either in the creation of pseudo-log models prior the description of the drill core samples or in the data validation stage, when it is necessary to standardize several descriptions made by different professionals.

Highlights

The "data-rich paradigm" is already a reality in mineral exploration
For the data used in this work, the optimum group number for Model-Based Cluster (MBC) is four, and the probabilistic model type determined by the Bayesian Information Criteria (BIC) is elliptical with equal volume, shape, and orientation
Compared with the manual log description, it is notable that the models built on filtered data have better performance than those generated on raw data, and the MBC filtered model had better performance than the others

Summary

Introduction

The "data-rich paradigm" is already a reality in mineral exploration. This scenario can be found in several segments of the mineral industry, such as airborne geophysics, exploratory geochemical surveys, mineral resources, and reserves analyses evaluation, studies of physical properties of the rock, mechanical assays of mine engineering, ore grade control, and environmental monitoring, among many other study branches associated with various stages in the mineral research. Some government entities (Agencies and Geological Surveys) provide valuable data for the mineral industry, increasing the available volume of information In this data-dominated scenario, fast, consistent, and reliable analysis is vital for decision making and resource investment. We present an approach to assist the drill core management with a fast, consistent, and highly reproducible methodology based on open-source code and data already acquired, which sometimes represents a challenge due to survey issues and/or the high number of geochemical features. For this purpose, we employed unsupervised Machine Learning methods in portable X-Ray Fluorescence data of drill core rock samples to generate pseudo-log.

Geological setting

Rock samples

X-Ray Fluorescence

Data management

Exponentially Weighted Moving Average Filtering

Correlation

Dimensionality Reduction

K-means

Model-Based Clustering

Results

Conclusions

Verification of Data Distribution

Representativity of measurements

Methods

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Geochemical Exploration	Publication Date: Sep 3, 2021
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Unsupervised drill core pseudo-log generation in raw and filtered data, a case study in the Rio Salitre greenstone belt, São Francisco Craton, Brazil

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Geochemical Exploration

Lead the way for us

Similar Papers

Knee Point Detection in BIC for Detecting the Number of Clusters
Qinpei Zhao ... Pasi Fränti
-
Qinpei Zhao, et. al.Qinpei Zhao ... Pasi Fränti
01 Jan 2008
01 Jan 2008

Robust EM algorithm for model-based curve clustering
Faicel Chamroukhi
-
Faicel ChamroukhiFaicel Chamroukhi
01 Aug 2013
01 Aug 2013

Hydrothermal activity in the Obiro deposit embedded in the Tagawa acidic rocks, Uetsu region, NE Japan
Yuki Nakajima ... Yuya Izumino
Resource Geology | VOL. 70
Yuki Nakajima, et. al.Yuki Nakajima ... Yuya Izumino
18 Aug 2020
Resource Geology | VOL. 70

U-Pb detrital zircon geochronology of the Turee Creek Group, Hamersley Basin, Western Australia: Timing and correlation of the Paleoproterozoic glaciations
Tom Caquineau ... Pascal Philippot
Precambrian research | VOL. 307
Tom Caquineau, et. al.Tom Caquineau ... Pascal Philippot
12 Jan 2018
Precambrian research | VOL. 307

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised drill core pseudo-log generation in raw and filtered data, a case study in the Rio Salitre greenstone belt, São Francisco Craton, Brazil

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Geochemical Exploration