Abstract

Fueled with growth in the fields of Internet of Things (IoT) and Big Data, data has become one of the most valuable assets in today’s world. While we are leveraging this data for analyzing complex systems using machine learning and deep learning, a considerable amount of time and effort is spent on addressing data quality issues. If undetected, data quality issues can cause large deviations in the analysis, misleading data scientists. To ease the effort of identifying and addressing data quality challenges, we introduce DQA, a scalable, automated and interactive data quality advisor. In this paper, we describe the DQA framework, provide detailed description of its components and the benefits of integrating it in a data science process. We propose a programmatic approach for implementing the data quality framework which automatically generates dynamic executable graphs for performing data validations fine-tuned for a given dataset. We discuss the use of DQA to build a library of validation checks common to many applications. We provide insight into how DQA addresses many persistence and usability issues which currently make data cleaning a laborious task for data scientists. Finally, we provide a case study of how DQA is implemented in a realworld system and describe the benefits realized.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.