A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri

Erik Mueller,Michael Elliott,Srikanth Mudigonda,J S Onésimo Sandoval

doi:10.3390/ijgi8010013

Erik Mueller, Michael Elliott + Show 2 more

Open Access

https://doi.org/10.3390/ijgi8010013

Copy DOI

Abstract

Mainstream machine learning approaches to predictive analytics consistently prove their ability to perform well using a variety of datasets, although the task of identifying an optimally-performing machine learning approach for any given dataset becomes much less intuitive. Methods such as ensemble and transformation modeling have been developed to improve upon individual base learners and datasets with large degrees of variance. Despite the increased generalizability and flexibility of ensemble approaches, the cost often involves sacrificing inference for predictive ability. This paper introduces an alternative approach to ensemble modeling, combining the predictive ability of an ensemble framework with localized model construction through the incorporation of cluster analysis as a pre-processing technique. The workflow not only outperforms independent base learners and comparative ensemble methods, but also preserves local inferential capability by manipulating cluster parameters and maintaining interpretable relative importance values and non-transformed coefficients for the overall consideration of variable importance. This paper demonstrates the ensemble technique on a dataset to estimate rates of health insurance coverage across the state of Missouri, where the cluster pre-processing assists in understanding both local and global variable importance and interactions when predicting high concentration areas of low health insurance coverage based on demographic, socioeconomic, and geospatial variables.

Highlights

The ability to simultaneously model and analyze both local and global relationships in geospatially-referenced statistical models is a challenge commonly faced by social science researchers, geographers and spatial statisticians
We found that models with fewer than four clusters had a progressive increase in mean square error (MSE), which was likely due to the increased heterogeneity in the larger clusters
Following the support vector regression category, the best-performing category consisted of dimension reduction methods, with principal components regression (PCR) and partial least squares (PLS) producing the two smallest MSE values, respectively (0.487, 0.488)

Summary

Introduction

The ability to simultaneously model and analyze both local and global relationships in geospatially-referenced statistical models is a challenge commonly faced by social science researchers, geographers and spatial statisticians. Methods common to social scientists, such as linear and logistic regression, are limited in that they only model global relationships—entire datasets—and produce results that can only be attributed to the dataset as a whole. Other methods, such as clustering techniques, are often used to identify concentrations or “hotspots” within or between variables in a dataset, but fail to provide information about intervariable relationships or dependencies, which can be accomplished through regression analysis [1]. Building on the work of Trivedi et al [2,3,4] this study makes a unique contribution to the literature by using the clustering method as a pre-processing technique to geospatial data and regression techniques to examine global dataset trends, while inferring local intervariable relationships

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ISPRS international journal of geo-information	Publication Date: Dec 28, 2018
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ISPRS international journal of geo-information

Lead the way for us

Similar Papers

Are lymphoma survivors really at higher risk for unemployment/underemployment?
Patrick Tang ... Fausto R Loberiza
Leukemia and Lymphoma | VOL. 53
Patrick Tang, et. al.Patrick Tang ... Fausto R Loberiza
21 May 2012
Leukemia and Lymphoma | VOL. 53

Race, ethnicity, and the dynamics of health insurance coverage
Robert W Fairlie ... Rebecca A London
-
Robert W Fairlie, et. al.Robert W Fairlie ... Rebecca A London
01 Jan 2009
01 Jan 2009

Race, Ethnicity and the Dynamics of Health Insurance Coverage
Robert W Fairlie ... Rebecca A London
SSRN | VOL. -
Robert W Fairlie, et. al.Robert W Fairlie ... Rebecca A London
01 Jan 2008
SSRN | VOL. -

The Enigma of Higher Income Immigrants With Lower Rates of Health Insurance Coverage in the United States
Elizabeth Bass
Journal of immigrant and minority health | VOL. 8
Elizabeth BassElizabeth Bass
01 Jan 2006
Journal of immigrant and minority health | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ISPRS international journal of geo-information