Abstract

The research aimed to use Fuzzy C-Means clustering in content-based document clustering to classify general websites based on their content. The data used were a table ranking of the most visited websites for Indonesia, taken from https://dataforseo.com/top-1000-websites/ on September 24th, 2022. The research was conducted with two different cases using Fuzzy C-Means clustering, which had two different iteration parameter values, namely 100 and 200 in maximum iteration. The research results on Fuzzy C-Means clustering in content-based document clustering are based on the two cases. These different maximum iteration parameters result in a different amount of website name data in the cluster. They are formed in the first and second clusters only. However, in the other clusters, the numbers are all the same. The results of the cluster research are validated using the silhouette coefficient, with case no. 1 and no. 2 values being 0,977783879 and 0,977788457. The use of Fuzzy C-Means clustering in content-based document clustering has an excellent performance when this method is applied to group general websites based on their content. With that result, content-based clustering can be also applied in other cases. Hence, the results can be considered to be applied to other cases for content-based clustering in the future.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.