Abstract

This article investigates the evolution of data quality issues from traditional structured data managed in relational databases to Big Data. In particular, the paper examines the nature of the relationship between Data Quality and several research coordinates that are relevant in Big Data, such as the variety of data types, data sources and application domains, focusing on maps, semi-structured texts, linked open data, sensor & sensor networks and official statistics. Consequently a set of structural characteristics is identified and a systematization of the a posteriori correlation between them and quality dimensions is provided. Finally, Big Data quality issues are considered in a conceptual framework suitable to map the evolution of the quality paradigm according to three core coordinates that are significant in the context of the Big Data phenomenon: the data type considered, the source of data, and the application domain. Thus, the framework allows ascertaining the relevant changes in data quality emerging with the Big Data phenomenon, through an integrative and theoretical literature review.

Highlights

  • The area of Big Data (BD) is currently subject of intense investigation in academic literature, pushed by the growth of data made available in the Web and collected by fixed and mobile sensors

  • The paper has investigated the nature of the relationship between Data Quality and several research coordinates that are relevant in Big Data, such as the variety of data types, data sources and application domains, focusing on maps, semi-structured texts, linked open data, sensor & sensor networks and official statistics

  • We believe that the selected coordinates provide insights for Big Data quality issues in areas such as business intelligence

Read more

Summary

Introduction

The area of Big Data (BD) is currently subject of intense investigation in academic literature, pushed by the growth of data made available in the Web and collected by fixed and mobile sensors. To gain value from this data, you must choose an alternative way to process it” Another issue that in recent years raised the attention of scholars and practitioners is Data Quality (DQ), a multifaceted concept, to the definition of which different dimensions concur. We present the conceptual framework for analyzing the evolution of the DQ issues from relational databases to the diverse data types, application domains and sources considered in the following. The three BD coordinates, namely data types, sources and application domains are analyzed in terms of their structural characteristics. Every path considers the evolution of a dimensions cluster from the relational domain to the issues target of the BD coordinates above introduced (i.e., data types, sources and application domains), further showing how the evolution of a given dimension can be interpreted a posteriori according to the structural characteristics considered. A final general discussion on DQ dimension clusters and BD coordinates concludes the paper

Methodology adopted in the paper
Dataset accessibility
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.