Abstract

The purpose of this project was to develop and test a methodology for determining the likelihood that mineral resource location records from two nationwide mineral resource information databases represent the same site. The long-term goal is to create a comprehensive database by merging the Mineral Resource Data System (MRDS) of the U.S. Geological Survey, and the Mineral Availability System/Mineral Industry Location System (MAS/MILS) of the U.S. Bureau of Mines (now part of the Geological Survey). Part of that process involves linking records for the same site from each database. Match probabilities were estimated using a logistic regression of mineral resource location attributes, derived from known matched (cross-referenced) and known unmatched randomly sampled mineral site pairs from within the conterminous United States (n=10,000). Model accuracy was assessed using a randomly sampled test dataset, not used in logistic model development (n=4,000). Probability distributions were similar between the development and test datasets. The overall agreement beyond chance was good for the test data set\(\left( {\hat k = 0.736} \right)\) using the kappa statistic. Classification accuracy was 89.6% for known matched site pairs and 84.0% for known unmatched site pairs based on a probability threshold of 0.50 for a match. Distributions of attributes were similar between the development and test datasets. This classification method is a viable approach for estimating match probabilities between database records.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call