Seleksi Fitur Menggunakan Eigen Vector Untuk Peningkatan Kinerja K-Means Clustering Dalam Pengelompokan Data

Muhammad Zarlis,Nugroho Syahputra,Syahril Efendi

doi:10.47065/bits.v4i2.2022

Muhammad Zarlis, Nugroho Syahputra + Show 1 more

Open Access

https://doi.org/10.47065/bits.v4i2.2022

Copy DOI

Abstract

The large number of data set attributes from the data grouping process with K-Means Clustering can affect the number of iterations produced. In this research, Eigen Vector is used to perform feature selection on the data set. The selected data set is then clustered using K-Means Clustering. The data set used in this research is the Wine Quality Dataset obtained from the UCI Machine Learning Repository, with 11 attributes, 4898 data records and 7 attribute classes. Then the South German Credit Dataset was obtained from kaggle.com with 20 attributes, 1000 data records and 2 attribute classes. The results of this research indicate that the number of iterations obtained from the comparison of tests using K-Means without feature selection is that in the Wine Quality Dataset, 11 iterations are obtained, and in the South German Credit Dataset, there are 10 iterations. Meanwhile, K-Means with Eigen Vector feature selection obtained the number of iterations in the Wine Quality Dataset with a total of 5 iterations, and in the South German Credit Dataset with a total of 4 iterations. Clustering evaluation was calculated using Sum of Square Error (SSE). The SSE value in K-Means Clustering without feature selection from the Wine Quality Dataset is 678.5735, while in the South German Credit Dataset it is 1534.3167. While the K-Means Clustering with Eigen Vector from the Wine Quality Dataset is 383.0517, and the South German Credit Dataset is 469.0698. From the results of the proposed method is able to reduce the percentage of errors and minimize the number of iterations on K-Means Clustering with feature selection using Eigen Vector

Full Text