Mining and visualising contradictory data

Honour Chika Nwagwu,George Okereke,Chukwuemeka Nwobodo

doi:10.1186/s40537-017-0100-9

Honour Chika Nwagwu, George Okereke + Show 1 more

Open Access

https://doi.org/10.1186/s40537-017-0100-9

Copy DOI

Journal: Journal of Big Data	Publication Date: Oct 30, 2017
Citations: 4	License type: open-access

Affiliation: University of Nigeria

Abstract

Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidimensional attribute-values of a big dataset. This work explains the importance of identifying contradictions in a noisy dataset. It also examines how contradictory data in a large and noisy dataset can be mined and visually analysed. The authors developed ‘ConTra’, an open source application which applies mutual exclusion rule in identifying contradictory data, existing in comma separated values (CSV) dataset. ConTra’s capability to enable the identification of contradictory data in different sizes of datasets is examined. The results show that ConTra can process large dataset when hosted in servers with fast processors. It is also shown in this work that ConTra is 100% accurate in identifying contradictory data of objects whose attribute values do not conform to the mutual exclusion rule of a dataset in CSV format. Different approaches through which ConTra can mine and identify contradictory data are also presented.

Highlights

A noisy dataset can contain contradictory data
This paper presents the importance of identifying contradictions in a noisy dataset and how to apply mutual exclusion rule in identifying contradictory data
The same contradictions as identified by ConTra and the use of query were observed. This confirms that ConTra is 100% accurate in retrieving contradictory data from objects associated with mutually exclusive attribute values in an investigated comma separated values (CSV) dataset

Summary

Introduction

A noisy dataset can contain contradictory data. Contradictory data is synonymous to incorrect data and it is important that such data be investigated and evaluated when analysing a noisy dataset. This work explains how to visually identify contradictory values which are associated with mutually exclusive attributes in a large and noisy comma separated values (CSV) dataset. It answers the research question “how can contradictions in mutually exclusive data of a large and noisy dataset, be visually identified?”

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mining and visualising contradictory data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Development of an XML‐document compaction method to improve data‐processing performance
Shigeru Yoshida ... Junichi Odagiri
Systems and Computers in Japan | VOL. 38
Shigeru Yoshida, et. al.Shigeru Yoshida ... Junichi Odagiri
28 Mar 2007
Systems and Computers in Japan | VOL. 38

A hybrid approach for training recurrent neural networks: application to multi-step-ahead prediction of noisy and large data sets
S Chtourou ... M Chtourou
Neural Computing and Applications | VOL. 17
S Chtourou, et. al.S Chtourou ... M Chtourou
05 May 2007
Neural Computing and Applications | VOL. 17

Publishing CSV Data as Linked Data on the Web
S M Hasan Mahmud ... Md Rezwan Hasan
-
S M Hasan Mahmud, et. al.S M Hasan Mahmud ... Md Rezwan Hasan
24 Sep 2019
24 Sep 2019

Are pie charts evil? An assessment of the value of pie and donut charts compared to bar charts
Andrew Hill
Information Visualization | VOL. -
Andrew HillAndrew Hill
24 Jun 2024
Information Visualization | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining and visualising contradictory data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data