A moan, a discursion into the visualisation of very large spatial data and some rubrics for identifying big questions

Alexis Comber,Chris Brunsdon,Rich Harris,Martin Charlton

doi:10.21433/b3115sc537n4

Abstract

GIScience 2016 Short Paper Proceedings A moan, a discursion into the visualisation of very large spatial data and some rubrics for identifying big questions A. Comber 1 , C. F. Brunsdon 2 , M. Charlton, R. Harris 3 School of Geography, University of Leeds, LS2 9JT, UK Email: a.comber@leeds.ac.uk NUI Maynooth, Maynooth, Co Kildare, Ireland Email: {christopher.brunsdon; martin.charlton} @nuim.ie School of Geographical Sciences, University of Bristol, BS8 1SS. Email: rich.harris@bris.ac.uk Abstract This short paper links 2 areas of big data science in the context of GIScience: inferential analysis and visualisation. It discusses ideas around integration and analysis of large, spatial referenced datasets and considers how results of these can best be visualised. It advocates a critical approach to big data visualization and warns of the inherent dangers of simply identifying patterns, whether through data mining, modeling or visualization. It adds to ongoing debates by suggesting techniques and rubrics, possibly even hinting at a manifesto. 1. Introduction There is an increasing amount of data of all kinds available to scientists that provide opportunities to gain novel insights about all kinds of phenomena. The availability of these data is being driven by 2 factors. 1) The large amount of open data and wider recognition of the value that can be added to that data (Molloy, 2011) by linking it to other data and by developing novel data analyses; 2) The many new forms of data generated every day by citizens, either passively or actively (See et al., 2016) on GPS- and web-enabled tablets, devices has resulted in an explosion of citizen contributed, crowdsourced or volunteered data. Much has been written about the characteristics of these very large datasets: from the 3 or 5 or is it 7Vs? to the 3Ds (Dynamic, Diverse, Dense to which Dirty should be added), and perhaps more interestingly, their existence has stimulated a number of theoretical and practical considerations. This ranges from the need to revisit classic measures of and tools for statistical inference (Brunsdon, in press) to the need to redesign some of the more commonly used software tools to handle the data volumes. The role of GIScience relates to location, which may be precise in the form of latitude and longitude or approximate for example using a small census area reference or a post-code. However, despite being in this age of so called ‘Big Data’ the real challenge is to identify and answer ‘Big Questions’ which so far the research community, including the GIScience community, has failed to do. 2. Large quantities of spatially referenced data There are large quantities of spatially referenced data of many different types, describing many different phenomena. This provides opportunities for new forms of knowledge. A typical data-mining / computer science approach is encapsulated by the following quote: ‘Scouring databases and other data stores for insight is often compared to the proverbial search for a needle in a haystack, but … big data turns that idea on its head’ and quoting Viktor Mayer-Schonberger ‘With big data, we don’t know what the needle is. We can let the data speak and use it to generate really intriguing questions’ 1 .GIScience offers suites of http://data-informed.com/big-datas-value-much-larger-than-specific-business-questions/

Full Text