Abstract

While the potential benefits of Big Data adoption are significant, and some initial successes have already been realized, there remain many research and technical challenges that must be addressed to fully realize this potential. The Big Data processing, storage and analytics, of course, are major challenges that are most easily recognized. However, there are additional challenges related for instance to Big Data collection, integration, and quality enforcement. This paper proposes a hybrid approach to Big Data quality evaluation across the Big Data value chain. It consists of assessing first the quality of Big Data itself, which involve processes such as cleansing, filtering and approximation. Then, assessing the quality of process handling this Big Data, which involve for example processing and analytics process. We conduct a set of experiments to evaluate Quality of Data prior and after its pre-processing, and the Quality of the pre-processing and processing on a large dataset. Quality metrics have been measured to access three Big Data quality dimensions: accuracy, completeness, and consistency. The results proved that combination of data-driven and process-driven quality evaluation lead to improved quality enforcement across the Big Data value chain. Hence, we recorded high prediction accuracy and low processing time after we evaluate 6 well-known classification algorithms as part of processing and analytics phase of Big Data value chain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call