A COMPARATIVE CLUSTERING MODEL THAT CONSIDERS FALSE POSITIVES AND FALSE NEGATIVES IN SOME SOCIOECONOMIC APPLICATIONS

Rom�n A. Mora-Guti�rrez,Eric A. Rinc�n-Garc�a,Pedro Lara-Velazquez,Miguel-Angel Guti�rrez-Andrade,D.E. Urueta-Hinojosa,Sergio-Gerardo De-los-Cobos-Silva

doi:10.25102/fer.2020.02.03

Abstract

Unsupervised learning enables classifier models to be built quickly and inexpensively in comparison with supervised approaches because the labeling task is eliminated. On the other hand, to assess the quality of a classifier, the only parameter to consider is usually accuracy, treating incorrect predictions like if they had the same importance when in reality the consequences of diagnosing a healthy patient as sick (Type I Error), or diagnosing a sick patient as healthy (Type II Error) are different. That is why, depending on the application, it is preferable to avoid a specific type of error, even if the accuracy decreases. The present work shows a model based on clustering methods that take into account Type I and II Errors to solve medical and business instances using three techniques: k-means, Spectral and Gauss. Based on representative and well-studied datasets for socioeconomic applications, the results show that the accuracy of a model is not a conclusive parameter and to make a decision it is necessary to focus on errors in the confusion matrix which according to each specific instance, take a different meaning and significance. Our results and analysis are discussed to determine the best model for each case study. Finally, conclusions and limitations are analyzed

Full Text