Big data quality framework: a holistic approach to continuous quality management

Ikbal Taleb,Mohamed Adel Serhani,Rachida Dssouli,Chafik Bouhaddioui

doi:10.1186/s40537-021-00468-0

Ikbal Taleb, Mohamed Adel Serhani + Show 2 more

Open Access

https://doi.org/10.1186/s40537-021-00468-0

Copy DOI

Abstract

Big Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and analyzed to generate value-added data-driven insights and decisions. Degradation in Data Quality may result in unpredictable consequences. In this case, confidence and worthiness in the data and its source are lost. In the Big Data context, data characteristics, such as volume, multi-heterogeneous data sources, and fast data generation, increase the risk of quality degradation and require efficient mechanisms to check data worthiness. However, ensuring Big Data Quality (BDQ) is a very costly and time-consuming process, since excessive computing resources are required. Maintaining Quality through the Big Data lifecycle requires quality profiling and verification before its processing decision. A BDQ Management Framework for enhancing the pre-processing activities while strengthening data control is proposed. The proposed framework uses a new concept called Big Data Quality Profile. This concept captures quality outline, requirements, attributes, dimensions, scores, and rules. Using Big Data profiling and sampling components of the framework, a faster and efficient data quality estimation is initiated before and after an intermediate pre-processing phase. The exploratory profiling component of the framework plays an initial role in quality profiling; it uses a set of predefined quality metrics to evaluate important data quality dimensions. It generates quality rules by applying various pre-processing activities and their related functions. These rules mainly aim at the Data Quality Profile and result in quality scores for the selected quality attributes. The framework implementation and dataflow management across various quality management processes have been discussed, further some ongoing work on framework evaluation and deployment to support quality evaluation decisions conclude the paper.

Highlights

Big Data is universal [1], it consists of large volumes of data, with unconventional types
Data quality profile (DQP) and repository (DQPREPO) We describe hereafter the content of Data Quality Profile (DQP) and the DQP repository and the DQP levels captured through the lifecycle of framework processes
B) Quality selection: It consists of a selection of an appropriate quality metric to evaluate data quality dimensions for an attribute of a Big Data sample set and returns a count of correct values, which comply with the metric formula

Summary

Introduction

Big Data is universal [1], it consists of large volumes of data, with unconventional types. This will help the user to obtain an overview of some DQDs and make better attributes selection based on this first quality approximation with a ready-to-use list of rules for pre-processing. This includes, for example, quality requirements, DQES, DQD scores, data quality rules, Pre-Processing activities, activity functions, DQD metrics, and Data Profiles.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Big Data	Publication Date: May 29, 2021
Citations: 50	License type: open-access

R Discovery Prime

R Discovery Prime

Big data quality framework: a holistic approach to continuous quality management

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

An Advanced Big Data Quality Framework Based on Weighted Metrics
Widad Elouataoui ... Youssef Gahi
Big Data and Cognitive Computing | VOL. 6
Widad Elouataoui, et. al.Widad Elouataoui ... Youssef Gahi
09 Dec 2022
Big Data and Cognitive Computing | VOL. 6

Big Data Quality: A Quality Dimensions Evaluation
Ikbal Taleb ... Chafik Bouhaddioui
-
Ikbal Taleb, et. al.Ikbal Taleb ... Chafik Bouhaddioui
01 Jul 2016
01 Jul 2016

DATA QUALITY DIMENSIONS, METRICS, AND IMPROVEMENT TECHNIQUES
Menna Ibrahim Gabr ... Doaa Saad Elzanfaly
Future Computing and Informatics Journal | VOL. 6
Menna Ibrahim Gabr, et. al.Menna Ibrahim Gabr ... Doaa Saad Elzanfaly
11 Jul 2021
Future Computing and Informatics Journal | VOL. 6

Big Data Quality: A Data Quality Profiling Model
Ikbal Taleb ... Mohamed Adel Serhani
-
Ikbal Taleb, et. al.Ikbal Taleb ... Mohamed Adel Serhani
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Big data quality framework: a holistic approach to continuous quality management

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data