Data quality measures based on granular computing for multi-label classification

Marilyn Bello,Gonzalo Nápoles,Koen Vanhoof,Rafael Bello

doi:10.1016/j.ins.2021.01.027

Abstract

Rough set theory is a granular computing formalism that allows analyzing a given dataset through well-defined measures. Some of these measures aim to characterize datasets used to discover knowledge, mostly in traditional classification problems. Measuring the data quality is pivotal to estimate beforehand the problem’s difficulty since a classification model’s accuracy heavily depends on the data quality. However, to the best of our knowledge, there are no measures devoted to analyzing the quality of multi-label datasets. In this paper, we propose six data quality measures for multi-label problems, which are based on different granular approaches. Some of these measures redefine the decision class concept, while others redefine the consistency concept. Moreover, we study the impact of the similarity threshold parameters and the distance functions on the behavior of these measures. The numerical simulations show a statistical correlation between the measures that redefine the consistency concept and the performance of the ML-kNN classifier.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data quality measures based on granular computing for multi-label classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Feb 1, 2021
Citations: 20

Similar Papers

A basic model for assessing primary health care electronic medical record data quality
Amanda L Terry ... Simon De Lusignan
BMC Medical Informatics and Decision Making | VOL. 19
Amanda L Terry, et. al.Amanda L Terry ... Simon De Lusignan
12 Feb 2019
BMC Medical Informatics and Decision Making | VOL. 19

Study on water quality analysis and early-warning technology based on rough set and evidence theory
...
-
, et. al. ...
20 Nov 2012
20 Nov 2012

Data Quality Assessment: A Case Study of PT JAS Using TDQM Framework
Wahyu Ari Bowo ... Achmad Nizar Hidayanto
-
Wahyu Ari Bowo, et. al.Wahyu Ari Bowo ... Achmad Nizar Hidayanto
01 Oct 2019
01 Oct 2019

An Advanced Big Data Quality Framework Based on Weighted Metrics
Widad Elouataoui ... Saida El Mendili
Big Data and Cognitive Computing | VOL. 6
Widad Elouataoui, et. al.Widad Elouataoui ... Saida El Mendili
09 Dec 2022
Big Data and Cognitive Computing | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data quality measures based on granular computing for multi-label classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences