Abstract
The quality of data for decision-making will always be a major factor for companies that want to remain competitors. In addition, the era of Big Data has brought new challenges for the processing, management, storage of data and in particular the challenge represented by the veracity of these data which is one of the 5Vs that characterizes Big Data. This characteristic that defines the quality or reliability of the data and its sources must be verified in the future systems of each company. In this paper, we present an approach that helps to improve the quality of Big Data by the distributed execution of algorithms for detecting and correcting data errors. The idea is to have a multi-agents model for errors detection and correction in big data flow. This model linked to a repository specific to each company. This repository contains the most frequent errors, metadata, error types, error detection algorithms and error correction algorithms. Each agent of this model represents an algorithm and will be deployed in multiple instances when needed. The use of these agents will go through two steps. In the first step, the detection agents and error correction agents manage each flow entering the system in real time. In the second step, all the processed data flows in first step will be a dataset to which the error detection and correction agents are applied in batch in order to process other types of errors. Among architectures who allow this processing type, we have chosen Lambda architecture.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have