Abstract

The ability to evaluate the validity of data is essential to any investigation, and manual “eyes on” assessments of data quality have dominated in the past. Yet, as the size of collected data continues to increase, so does the effort required to assess their quality. This challenge is of particular concern for networks that automate their data collection, and has resulted in the automation of many quality assurance and quality control analyses. Unfortunately, the interpretation of the resulting data quality flags can become quite challenging with large data sets. We have developed a framework to summarize data quality information and facilitate interpretation by the user. Our framework consists of first compiling data quality information and then presenting it through 2 separate mechanisms; a quality report and a quality summary. The quality report presents the results of specific quality analyses as they relate to individual observations, while the quality summary takes a spatial or temporal aggregate of each quality analysis and provides a summary of the results. Included in the quality summary is a final quality flag, which further condenses data quality information to assess whether a data product is valid or not. This framework has the added flexibility to allow “eyes on” information on data quality to be incorporated for many data types. Furthermore, this framework can aid problem tracking and resolution, should sensor or system malfunctions arise.

Highlights

  • Advancements in sensor measurement techniques and data collection continue to increase both the accuracy and quantity of measurements we are able to capture

  • One organization that is confronted by this challenge is the National Ecological Observatory Network (NEON)

  • We sought to formulate an automated framework that allows the results from sensor tests and quality assessment and quality control (QA/QC) analyses to be summarized in a way that is transparent and interpretable

Read more

Summary

Introduction

Advancements in sensor measurement techniques and data collection continue to increase both the accuracy and quantity of measurements we are able to capture. We sought to formulate an automated framework that allows the results from sensor tests and QA/QC analyses to be summarized in a way that is transparent and interpretable This will enable users to determine whether corrective actions are necessary if a data product has not met the requirements needed for a specific use case. We advance existing rank-based approaches to create a data quality assessment scheme that is modular in form so it can be transferred among a variety of sensor measurements and physical samples. It is shown how, at a later stage, means for gradation can be re-inserted. We conclude with discussing the applicability and expandability of the framework to various data types and use cases

Materials and Methods
A Framework for Tracking Quality Information in Large Datasets
Findings
Discussion and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call