Abstract
Predictive models are central to both archaeological research and cultural resource management. Yet, archaeological applications of predictive models are often insufficient due to small training data sets, inadequate statistical techniques, and a lack of theoretical insight to explain the responses of past land use to predictor variables. Here we address these critiques and evaluate the predictive power of four statistical approaches widely used in ecological modeling—generalized linear models, generalized additive models, maximum entropy, and random forests—to predict the locations of Formative Period (2100–650 BP) archaeological sites in the Grand Staircase-Escalante National Monument. We assess each modeling approach using a threshold-independent measure, the area under the curve (AUC), and threshold-dependent measures, like the true skill statistic. We find that the majority of the modeling approaches struggle with archaeological datasets due to the frequent lack of true-absence locations, which violates model assumptions of generalized linear models, generalized additive models, and random forests, as well as measures of their predictive power (AUC). Maximum entropy is the only method tested here which is capable of utilizing pseudo-absence points (inferred absence data based on known presence data) and controlling for a non-representative sampling of the landscape, thus making maximum entropy the best modeling approach for common archaeological data when the goal is prediction. Regression-based approaches may be more applicable when prediction is not the goal, given their grounding in well-established statistical theory. Random forests, while the most powerful, is not applicable to archaeological data except in the rare case where true-absence data exist. Our results have significant implications for the application of predictive models by archaeologists for research and conservation purposes and highlight the importance of understanding model assumptions.
Highlights
Predicting how and explaining why past people used their landscape is critical for informing contemporary land management decisions [1] and answering key anthropological research questions [2,3,4,5]
We find that the majority of the modeling approaches struggle with archaeological datasets due to the frequent lack of true-absence locations, which violates model assumptions of generalized linear models, generalized additive models, and random forests, as well as measures of their predictive power (AUC)
We provide a set of methods for better understanding potential issues researchers and resource managers will encounter when constructing a predictive model, such as Advancing predictive modeling in archaeology identifying collinearity among predictor variables, determining how representative inventory data are of the landscape, and gauging a model’s predictive power
Summary
Predicting how and explaining why past people used their landscape is critical for informing contemporary land management decisions [1] and answering key anthropological research questions [2,3,4,5] To this end, archaeological researchers often create predictive models [6,7]. Archaeological predictive modeling is “the practice of building models that in some way, indicate the likelihood of archaeological sites, cultural resources, or past landscape use across a region” [8] While this effort goes back at least as early as the 1950s [8,9,10], the advent of personal computers, geographic information systems (GIS), high-resolution environmental data, and robust statistical techniques has dramatically increased the implementation and creation of predictive models [8].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.