Method for inferring the number of clusters based on a range of attribute values with subsequent automatic data labeling

Aline Montenegro Leal Silva,Francisco Jair De Oliveira Neres,Ana Paula Da Silva Mendes,Vinícius Ponte Machado,André Macedo Santana,Ricardo De Andrade Lira Rabêlo

doi:10.1016/j.procs.2023.08.194

Aline Montenegro Leal Silva, Francisco Jair De Oliveira Neres + Show 4 more

Open Access

https://doi.org/10.1016/j.procs.2023.08.194

Copy DOI

Abstract

Machine learning is a suitable pattern recognition technique for detecting correlations between data. In the case of unsupervised learning, the groups formed from these correlations must receive a label, which consists of describing them in terms of their most relevant attributes and their respective ranges of values so that they are understood automatically. In this research work, this process is called labeling. However, a challenge for researchers is to establish the optimal number of groups submitted to grouping, which influences the performance of this process. Therefore, this research aims to provide an inference approach to the number of clusters used in the clustering based on the range of attribute values, followed by automatic data labeling to maximize the understanding of the groups obtained. This methodology was applied to four databases, and the results show that it contributes to the interpretation of groups since it generates more accurate labels with a hit rate above 93%.

Full Text