Clustering Fake News with K-Means and Agglomerative Clustering Based on Word2Vec

Izhar Muhammad Tianda,Dita Amelia,M Fariz Fadillah Mardianto,Said Agil Al Munawwarah,Nurhalisa Ishak,Mohammad Noufal Ubadah,Elly Ana

doi:10.47191/ijmcr/v12i2.01

Abstract

Fake News on digital platforms is a major problem in this digital age. Many people want to find methods to detect Fake News. This research looks at a way to group Fake News articles using K-Means and Agglomerative Clustering techniques, using the semantic representations from Word2Vec embeddings. The researchers use natural language translation methods and advanced machine learning to improve the accuracy and efficiency of Fake News detection. The study involves getting meaningful features from textual data, turning them into vector representations using Word2Vec, and then applying clustering algorithms to sort similar articles. The methodology aims to improve the most recent state of the art in Fake News detection, helping to create more reliable and robust tools to fight misinformation in the digital age, In the comparative analysis of clustering metrics, K-Means clustering exhibits a Purity Score of 88.09% and an Adjusted Rand Score of 58.03%. Conversely, Agglomerative Clustering with the Ward method yields a Purity Score of 85.13% and an Adjusted Rand Score of 49.36%.The Purity Score of 88.09% for K-Means suggests a strong ability to form clusters where the majority of data points share the same true class. Agglomerative Clustering with Ward, though slightly lower at 85.13%, also demonstrates effective class separation within clusters. When considering the Adjusted Rand Score, which accounts for chance and measures the agreement between true and predicted labels, K-Means significantly outperforms Agglomerative Clustering with Ward. The scores are 58.03% and 49.36%, respectively

Full Text