In 2024, Indonesia will hold simultaneous general elections dominated by the participation of young people, especially Generation Z and Millennials, who seek political information primarily through the Internet, highlighting the crucial role of digital media in shaping public opinion. Detik.com is actively reporting on the 2024 elections, evidenced by a special election subchannel. However, the lack of topic categorization in this subchannel makes it difficult for readers to find in-depth information, and tracking and analyzing the large volume of news articles published daily is a significant challenge. This study employs Topic Modelling techniques, specifically the BERTopic method, to analyze topics related to the 2024 elections from Kompas.com news articles. The dataset, sourced from the detik.com election sub-channel, was collected via scraping from September 1, 2023, to February 14, 2024, totalling 15,019 articles. The text preprocessing involves case folding, cleaning, tokenizing, and stopword removal. Topic modelling using BERTopic includes embeddings with sentence-transformers "distiluse-base-multilingual-cased-v1," dimensionality reduction with UMAP, clustering with K-Means using optimal k=5 value evaluated by Elbow, tokenizer with CountVectorizer, and weighting scheme using c-TF-IDF. Based on the Silhouette Score of 0.566 and Silhouette plot results, the clustering results using the K-Means model with a value of k equal to 5 produce good clustering with clear inter-cluster distances. For other evaluations, the SSE value of 70223.257 provides an overview of the cluster distribution, the Davies-Bouldin Index of 0.758 shows that the cluster has a relatively good level of inter-cluster separation with good closeness within the cluster, the Calinski-Harabasz Index of 20083.489 shows good and compact inter-cluster separation, and the Dunn Index of 0.003 shows outliers that cause overlapping clusters and lack of clear separation. The evaluation results show that implementing the K-Means model with a value of k equal to 5 again emphasizes that the clustering results are good. The modelling results show an average topic coherence value of 0.0902 and produce five main topics in the 2024 election news on Detik.com topic 0: about presidential and vice presidential candidates (5,215 articles) with the representation of the words 'ganjar', 'prabowo', 'anies' and 'imin', topic 1: about general elections and related surveys (3,191 articles) with the representation of the words '2024', 'pemilu', 'pilpres' and 'suara', topic 2: news about Joko Widodo President (2,604 articles), topic 3: news about presidential and vice presidential debates (2046 articles) with representations of the words 'presiden', 'jokowi', 'demokrat' and 'politik' and topic 4: news about the figure Gibran Rakabuming Raka and related issues (1963 articles) with representations of the words 'raka', 'rakabuming', 'nomor' and 'urut'. Using the results of this research, readers can gain insights into the most discussed issues and the attention given to key figures in the 2024 election news on the detik.com news portal.
Read full abstract