Abstract
The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user’s standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user’s perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.
Highlights
Data Quality (DQ) is a subject that permeates most research
We show how to use the conceptual framework for DQ profiling (Section 3.1) and DQ status reporting based on a DQ profiling (Section 3.2)
Based on well-established principles and concepts from DQ literature [1], the framework allows the biodiversity informatics community to organize the way it addresses DQ so that data users can judge whether data are fit for use for a particular purpose and data owners can improve the quality of unsuitable data
Summary
Data Quality (DQ) is a subject that permeates most research. As a result, research on DQ, information quality, or data fitness for use has been conducted and applied in a number of domains, covering multiple aspects and approaches [1,2,3,4,5,6]. To enable the DQ assessment and management of biodiversity data, it is necessary to define relevant components to allow biodiversity data users, curators, holders and owners to determine and improve the fitness of data for use. This paper introduces a conceptual framework to define and organize the necessary concepts for enabling the assessment and management of the data fitness for use in the domain of biodiversity informatics. In the conceptual framework context, the TDQM first step is related to DQ profiling (see Sections 2.1 and 3.1), the second step to DQ status reporting (see Sections 2.2, 2.3 and 3.2), the third step to the DQ assessment (the action of judging the fitness for use) and fourth step to DQ management (the action of improving DQ, making data fitter for use).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.