Method for Inferring the Optimal Number of Clusters with Subsequent Automatic Data Labeling based on Standard Deviation

Aline Montenegro Leal Silva,Vinicius Ponte Machado,Francisco Alysson Da Silva Sousa,Andre Macedo Santana,Alysson Ramires De Freitas Santos

doi:10.14569/ijacsa.2023.01403102

Aline Montenegro Leal Silva, Vinicius Ponte Machado + Show 3 more

Open Access

https://doi.org/10.14569/ijacsa.2023.01403102

Copy DOI

Abstract

Machine learning is a suitable pattern recognition technique for detecting correlations between data. In the case of unsupervised learning, the groups formed from these correlations can receive a label, which consists of describing them in terms of their most relevant attributes and their respective ranges of values so that they are understood automatically. In this research work, this process is called labeling. However, a challenge for researchers is establishing the optimal number of clusters that best represent the underlying structure of the data subjected to clustering. This optimal number may vary depending on the data set and the grouping method used and influences the data clustering process and, consequently, the interpretability of the generated groups. Therefore, this research aims to provide an inference approach to the number of clusters to be used in the grouping based on the range of attribute values, followed by automatic data labeling based on the standard deviation to maximize the understanding of the groups obtained. This methodology was applied to four databases. The results show that it contributes to the interpretation of the groups since it generates more accurate labels without any overlap between ranges of values, considering the same attribute in different groups.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Method for Inferring the Optimal Number of Clusters with Subsequent Automatic Data Labeling based on Standard Deviation

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2023
License type: cc-by

Similar Papers

Method for inferring the number of clusters based on a range of attribute values with subsequent automatic data labeling
Aline Montenegro Leal Silva ... Ricardo De Andrade Lira Rabêlo
Procedia Computer Science | VOL. 222
Aline Montenegro Leal Silva, et. al.Aline Montenegro Leal Silva ... Ricardo De Andrade Lira Rabêlo
01 Jan 2023
Procedia Computer Science | VOL. 222

Dynamic changes of large-scale resting-state functional networks in major depressive disorder
Jiang Zhang ... Jiaojian Wang
Progress in Neuropsychopharmacology & Biological Psychiatry | VOL. 111
Jiang Zhang, et. al.Jiang Zhang ... Jiaojian Wang
29 May 2021
Progress in Neuropsychopharmacology & Biological Psychiatry | VOL. 111

Improving the Dynamic Clustering of Hyperspectral Data Based on the Integration of Swarm Optimization and Decision Analysis
Amin Alizadeh Naeini ... Mohammad Saadatseresht
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 7
Amin Alizadeh Naeini, et. al.Amin Alizadeh Naeini ... Mohammad Saadatseresht
01 Jun 2014
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 7

Objectively Determining the Number of Similar Hydrographic Clusters with Unsupervised Machine Learning
Carola Trahms ... Arne Biastoch
-
Carola Trahms, et. al.Carola Trahms ... Arne Biastoch
15 May 2023
15 May 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Method for Inferring the Optimal Number of Clusters with Subsequent Automatic Data Labeling based on Standard Deviation

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications