Abstract

This paper reflects on a number of trends towards a more open and reproducible approach to geographic and spatial data science over recent years. In particular, it considers trends towards Big Data, and the impacts this is having on spatial data analysis and modelling. It identifies a turn in academia towards coding as a core analytic tool, and away from proprietary software tools offering ‘black boxes’ where the internal workings of the analysis are not revealed. It is argued that this closed form software is problematic and considers a number of ways in which issues identified in spatial data analysis (such as the MAUP) could be overlooked when working with closed tools, leading to problems of interpretation and possibly inappropriate actions and policies based on these. In addition, this paper considers the role that reproducible and open spatial science may play in such an approach, taking into account the issues raised. It highlights the dangers of failing to account for the geographical properties of data, now that all data are spatial (they are collected somewhere), the problems of a desire for n = all observations in data science and it identifies the need for a critical approach. This is one in which openness, transparency, sharing and reproducibility provide a mantra for defensible and robust spatial data science.

Highlights

  • Notions of scientific openness, collective working and reproducibility have been identified as important considerations for critical data science and for critical spatial data science within the GIScience domain2 (Singleton et al 2016; Shannon and Walker 2018; Nüst et al 2018; Singleton and Arribas-Bel 2019)

  • These have emerged for a number of reasons: software costs, distrust of the ‘black box’ where data are processed without disclosure of the processing method, recognition of the scientific advantages of working in an open source environment where new methods are typically available several years before their availability in commercial software, as well as the wider practice of user community-generated software extensions and improvements—social computation, and reproducibility

  • The need for such transparency derives from the dangers of uncritical acceptance of black box spatial analyses, and the potential for erroneous results of such approaches precisely because ‘there is less of a requirement to think about the underlying processes that are being implemented’ (Singleton et al 2016, p. 1512) In essence, a reproducible research philosophy is one which allows all aspects of the answer generated by any given analysis to be tested

Read more

Summary

Introduction

Notions of scientific openness (open data, open code and open disclosure of methodology), collective working (sharing, collaboration, peer review) and reproducibility (methodological and inferential transparency) have been identified as important considerations for critical data science and for critical spatial data science within the GIScience domain (Singleton et al 2016; Shannon and Walker 2018; Nüst et al 2018; Singleton and Arribas-Bel 2019). The distrust of the black box reflects a widely held view that data analysis and research should, wherever reasonable, be capable of reproduction/replication by a third party, where reproducibility is defined is the exact duplication of the results using the same materials, and replicability means confirming original conclusions (Nüst et al 2018), for example, with new data (Kedron et al 2019), both as a scientific credo and to avoid further ‘climategates’ (Campbell 2010), in which a key issue was that the researchers involved would not release their data (or their code) The need for such transparency derives from the dangers of uncritical acceptance of black box spatial analyses, and the potential for erroneous results of such approaches precisely because ‘there is less of a requirement to think about the underlying processes that are being implemented’ Intensive development and implementation of tools followed by initial dissemination efforts

Spatial analysis
R developments in support of open spatial analysis
Big spatial data considerations
From data to spatial data
The MAUP
Critical spatial data science
Findings
Concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call