Abstract

Exploration datasets are often unequally sampled and have missing values for select variables of interest at select locations. Many state-of-the-art joint multivariate modeling workflows cannot consider missing data. One solution is to exclude incomplete samples from multivariate geostatistical modeling; however, this leads to a loss of information, increases uncertainty, and may introduce a bias in subsequent spatial numerical modeling workflows. Alternatives include (1) impute the missing values to generate a complete data set, termed single imputation, or (2) generate multiple realizations of the data that account for uncertainty in the missing values, termed multiple imputation (MI). MI is preferred as it quantifies uncertainty in missing values and transfers that uncertainty through spatial numerical modeling workflows. A new algorithm is proposed for the imputation of unequally sampled continuous, compositional, and categorical variables. A modified version of multiple-point direct sampling is used to impute missing values using multivariate multiple-point patterns from nearby completely sampled observations. Drillhole data are used as the ‘training data’ for direct sampling, with preference given to training data with similar co-located values to the imputation sample to account for non-stationarities common in mineral deposits. Advantages of the algorithm include: (1) alignment with the current best practice of MI, data uncertainty is incorporated through multiple realizations of missing data and can be carried through further geomodelling workflows; (2) using multivariate multiple-point patterns honors spatial and multivariate relationships in the data; (3) can be applied to joint imputation of categorical and continuous variables; (4) better reproduces input proportions and compositional data, and (5) can explicitly incorporate non-stationarities. The proposed methodology is compared to multiple imputation by chained equations (MICE) and Bayesian updating (BU) using two Iranian case studies; samples from these complete datasets are removed based on missing at random and missing not at random mechanisms. The third case study is a South American Iron deposit with compositional data that was originally incompletely sampled, the mechanism of missingness is unknown. Comparisons between imputation methodologies over the three case studies show that the proposed algorithm reduces prediction error, generates accurate and unbiased imputed values that reproduce multivariate relationships, reproduces multiple-point statistics patterns, and is robust in non-stationary data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call