Abstract

Clustering, as a fundamental exploratory data technique, not only is used to discover patterns and structures in complex datasets but also is utilized to group variables in high-dimensional data analysis. Dimension reduction through clustering helps identify important variables and reduce data dimensions without losing significant information. High-dimensional image datasets, such as Persian handwritten images, have numerous pixels, making statistical inference difficult. Such high-dimensionality property pose challenges for analysis and processing, requiring specialized techniques like clustering to extract information. Incorporating response variable information enhances clustering analysis, transforming it into a supervised method. This article evaluates a supervised clustering approach using Ridge and Lasso penalties, comparing them in analyzing a real dataset while identifying important variables. We demonstrate that despite choosing a small number of variables as important variables, Lasso penalty performs relatively well in predicting the labels of new observations for this multi-class dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call