Abstract

Volunteer geographical information (VGI), either in the context of citizen science or the mining of social media, has proven to be useful in various domains including natural hazards, health status, disease epidemics, and biological monitoring. Nonetheless, the variable or unknown data quality due to crowdsourcing settings are still an obstacle for fully integrating these data sources in environmental studies and potentially in policy making. The data curation process, in which a quality assurance (QA) is needed, is often driven by the direct usability of the data collected within a data conflation process or data fusion (DCDF), combining the crowdsourced data into one view, using potentially other data sources as well. Looking at current practices in VGI data quality and using two examples, namely land cover validation and inundation extent estimation, this paper discusses the close links between QA and DCDF. It aims to help in deciding whether a disentanglement can be possible, whether beneficial or not, in understanding the data curation process with respect to its methodology for future usage of crowdsourced data. Analysing situations throughout the data curation process where and when entanglement between QA and DCDF occur, the paper explores the various facets of VGI data capture, as well as data quality assessment and purposes. Far from rejecting the usability ISO quality criterion, the paper advocates for a decoupling of the QA process and the DCDF step as much as possible while still integrating them within an approach analogous to a Bayesian paradigm.

Highlights

  • Under the generic term of crowdsourcing, data collected from the public as volunteered geographical information (VGI) is becoming an increasingly important topic in many scientific disciplines

  • The European FP7 COBWEB project proposed a survey design tool including an authoring tool to combine different quality controls (QC) within a workflow that will serve as a quality assurance (QA) for each particular case study [9,10,11,12]; the data collected and qualified through the QA workflow is made available for a data conflation or data fusion (DCDF) within a completed data curation process [12,13,14,15]

  • The QA and DCDF entanglement takes its source from the stakeholder conceptual approach to the study to be put in place, which is influenced by several semantic overlaps concerning quality, validity, goals of the study, etc. that we explore in the paper

Read more

Summary

Introduction

Under the generic term of crowdsourcing, data collected from the public as volunteered geographical information (VGI) is becoming an increasingly important topic in many scientific disciplines. What all the different approaches agree on is the multidimensional aspect of quality, essential in crowdsourcing and citizen science This causes a tendency for the QA and DCDF processes to be entangled, as the ISO19157 usability criterion drives the data curation process (DCP). Even though generic QCs provide the logic and reasoning of attaching a quality that clarifies the uncertainty on data captured, the workflow composition of the QA is mainly driven by the future use of the data It is possible for the quality elements to be assessed within the DCDF algorithm itself, either included in the QA (the conflated data is a by-product) or completely separated (the data quality is a by-product). The interest is not in the results of the examples or to identify if one method is better but rather on the designs and approaches used and how these translate into the potential entanglement

Land Cover Validation Example
Flood Inundation Extent Estimation Example
Semantic Discourse
Data Quality of the End-Result
What Is Good and Bad Quality in Crowdsourcing for Environmental Spatial Data?
Trustworthiness and Data Quality
Semantic Harmonisation
Data Curation Process
Design of Experiment
Findings
Final Comments and Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.