Distance Based Measurement Approach for Truth Discovery by Resolving the Conflicts in Big Data

P Bastinthiyagaraj,A Aloysius

doi:10.1088/1742-6596/1142/1/012013

Abstract

Big data is a term that describes volume of data (terabytes to Exabyte’s), unstructured (include text and multimedia content), and complex in processing (from Medical data, Business transactions, Data captured by sensors, Social media/networks, Banking, Marketing, Government data, etc.). The traditional technologies are not sufficient to store, process and analyze the data. The unique technologies should be needed to analyze, manage the huge amount and unprocessed data. The number of sources produce huge amount of various descriptions for same object. This leads to data conflict and source conflict, when various sources generate various descriptions for same objects. Here it is the challenging one to identify which source produces quality information and which data is truly fit for an object. Here the heterogeneous data involved such as both numerical data (measurement data) and string data (classified data). The Data Analytics plays an important role to analyze the conflicted data. Distance based approach is used to find highest achievable performance, by minimizing the distance and maximizing the reliability of the sources. The main objective of this work is to resolve the conflicts from the heterogeneous data and identify the true information among the conflicted data from the various sources. Here, the continuous data only taken into an account to identify the true information.

Full Text