A comparative study on dimensionality reduction between principal component analysis and k-means clustering

Nor Azura Md.Ghani,Roziah Mohd Janor,Norsyela Muhammad Noor Mathivanan

doi:10.11591/ijeecs.v16.i2.pp752-758

Nor Azura Md.Ghani, Roziah Mohd Janor + Show 1 more

Open Access

https://doi.org/10.11591/ijeecs.v16.i2.pp752-758

Copy DOI

Abstract

<span>The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.</span>

Full Text