Abstract

The large number of theses will certainly make it difficult to find categories on thesis topics that have been written by students at a university. One of the uses of the Text Mining method is being able to group thesis objects into the number of clusters formed by the clustering algorithm. This study aims to compare 2 clustering algorithms, namely the K-Means and K-Medoids algorithms to obtain an accurate evaluation of the performance and computational time in the case of thesis clustering, so that relevant topics can be grouped and have better clustering accuracy. The evaluation parameter used is the Davies Bouldin Index (DBI) which is one of the testing techniques on clustering results, with the distribution of training data and testing data using cross validation using a repetition parameter of 10 folds iteration. From the results of the study with the Term Weighting condition used is Term Occurrences and using the N-Grams value is 2, it can be concluded that the K-Means algorithm has a better DBI value of -0.426. Meanwhile, the range of DBI values owned by K-Medoids with the same conditions has a DBI value of -1,631. However, from the visualization results using t-SNE with the same supporting parameters, there are options that can be used, namely the number of clusters is 6, and the DBI value is -1.110. For testing the computational time in the clustering process of 50 thesis documents, the K-Means algorithm has an average time of 2.5 seconds while the K-Medoids algorithm has an average time of 261.5 seconds. The computer specifications used are Asus ZenBook UX425EA.312 with the processor used is 11th Gen Intel® Core™ i5-1135G7 @ 2.40GHz @ 2.40GHz, the graphics card is Intel® Iris® Xe Graphics, the RAM used is 8GB, with storage of 512GB SSD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call