Forming Dataset of The Undergraduate Thesis using Simple Clustering Methods

Chinta 'Aliyyah Candramaya,Tio Dharmawan,Vandha Pradwiyasma Widharta

doi:10.25124/ijies.v7i01.187

Chinta 'Aliyyah Candramaya, Tio Dharmawan + Show 1 more

Open Access

PDF Available

https://doi.org/10.25124/ijies.v7i01.187

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Each university collects many undergraduate theses data but has yet to process it to make it easier forstudents to find references as desired. This study aims to classify and compare the grouping ofdocuments using expert and simple clustering methods. Experts have done ground truth using ORBoolean Retrieval and keyword generation. The best cluster was discovered by the experiments usingthe K-Means, K-Medoids, and DBSCAN clustering methods and using Euclidean, Manhattan, CityBlock, and Cosine Similarity metrics. The cluster with the best Silhouette Score compared to theaccuracy of the categorization of each document. The K-Means clustering method and the CosineSimilarity metric gave the best results with a Silhouette Score value of 0.105534. The comparisonbetween ground truth and the best cluster results shows an accuracy of 33.42%. The result shows thatthe simple clustering method cannot handle data with Negative Skewness and Leptokurtic Kurtosis.

Full Text