Internet usage in Indonesia has seen a significant increase, reaching 215.63 million users in 2022-2023, or 78.19% of the population. With the ease of internet access, digital news portals like Narasi TV have become a primary source of information for many people. However, the large number of news articles makes manual categorizing challenging. This study aims to classify Indonesian-language news documents from Narasi TV using the Nazief-Adriani algorithm for stemming and the K-Nearest Neighbor (KNN) method for classification. The text mining process begins with preprocessing, which includes case folding, tokenizing, stop-word filtering, and stemming. Using a dataset of 500 news documents, the study demonstrated that with a 90:10 data split, the average accuracy reached 93%, with the highest value being 100%. For the 80:20 data split, the average accuracy was 89%, with the highest value being 93%, and for a 70:30 data split, the average accuracy was 87%, with the highest value being 89%. In conclusion, the combination of the Nazief-Adriani algorithm and the KNN method with optimal k selection and random states obtained high accuracy, obtaining an average accuracy of 93%) in classifying Indonesian-language news documents. These results demonstrate the significant potential of text mining and classification techniques to manage digital news.
Read full abstract