Abstract

Big data changed the way in which we collect and analyze data. In particular, the amount of available information is constantly growing and organizations rely more and more on data analysis in order to achieve their competitive advantage. However, such amount of data can create a real value only if combined with quality: good decisions and actions are the results of correct, reliable and complete data. In such a scenario, methods and techniques for the Data Quality assessment can support the identification of suitable data to process. If for traditional database numerous assessment methods are proposed, in the Big Data scenario new algorithms have to be designed in order to deal with novel requirements related to variety, volume and velocity issues. In particular, in this paper we highlight that dealing with heterogeneous sources requires an adaptive approach able to trigger the suitable quality assessment methods on the basis of the data type and context in which data have to be used. Furthermore, we show that in some situations it is not possible to evaluate the quality of the entire dataset due to performance and time constraints. For this reason, we suggest to focus the Data Quality assessment only on a portion of the dataset and to take into account the consequent loss of accuracy by introducing a confidence factor as a measure of the reliability of the quality assessment procedure. We propose a methodology to build a Data Quality adapter module, which selects the best configuration for the Data Quality assessment based on the user main requirements: time minimization, confidence maximization, and budget minimization. Experiments are performed by considering real data gathered from a smart city case study.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.