Fuzzy C-Means in Content-Based Document Clustering for Grouping General Websites Based on Their Main Page Contents

Sri Probo Aditiyo,Rahma Fitriani,Eni Sumarminingsih

doi:10.21512/comtech.v14i2.9732

Sri Probo Aditiyo, Rahma Fitriani + Show 1 more

Open Access

https://doi.org/10.21512/comtech.v14i2.9732

Copy DOI

Abstract

The research aimed to use Fuzzy C-Means clustering in content-based document clustering to classify general websites based on their content. The data used were a table ranking of the most visited websites for Indonesia, taken from https://dataforseo.com/top-1000-websites/ on September 24th, 2022. The research was conducted with two different cases using Fuzzy C-Means clustering, which had two different iteration parameter values, namely 100 and 200 in maximum iteration. The research results on Fuzzy C-Means clustering in content-based document clustering are based on the two cases. These different maximum iteration parameters result in a different amount of website name data in the cluster. They are formed in the first and second clusters only. However, in the other clusters, the numbers are all the same. The results of the cluster research are validated using the silhouette coefficient, with case no. 1 and no. 2 values being 0,977783879 and 0,977788457. The use of Fuzzy C-Means clustering in content-based document clustering has an excellent performance when this method is applied to group general websites based on their content. With that result, content-based clustering can be also applied in other cases. Hence, the results can be considered to be applied to other cases for content-based clustering in the future.

Full Text