Clustering of students admission data using k-means, hierarchical, and DBSCAN algorithms

Erwin Lanceta Cahapin,Jocelyn L Reyes,Beverly Ambagan Malabag,Karl Louise Adrales,Cereneo Sailog Santiago Jr,Gemma S Legaspi

doi:10.11591/eei.v12i6.4849

Abstract

Admissions in the university undergo procedures and requirements before a student can be officially enrolled. The senior high school grades remain the most significant in college admission decisions. This paper presents the use of data mining to cluster students based on admission datasets. The admission dataset for 2019-2020 was obtained from the office of student affairs and services. This dataset contains 2,114 observations with 11 attributes. Data preparation and data standardization were performed to ensure that the dataset is ready for processing and implemented in R programming language. The optimal number of clusters (k) was identified using the silhouette method. This method gave an optimal number of k=2 which was used in the actual clustering using the k-means and hierarchical clustering algorithms. Both algorithms were able to cluster students into two: cluster 1-social sciences or board courses and cluster 2-management or non-board courses. Further, density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm was also used on the same dataset and it yielded a single cluster. This study can be replicated by using at least a 5-year dataset of students’ admission data employing other algorithms that would suggest students’ retention and turn over to board examinations.

Full Text