The widespread use of social media platforms and the internet has increased information sharing, including both true and false news. Detecting fake news is challenging, and several studies have been conducted to automate this process for popular languages such as English and Arabic. However, more research must be done on detecting fake news in low-resource languages such as Kurdish. This gap was addressed, and a publicly available Kurdish fake news dataset (KDFND) was used, comprising 100962 news articles, among which 50751 are real, and 50211 are fake news labeled as Real and Fake. In this study, three techniques were implemented to extract features from news texts, including word embedding, term frequency-inverse document frequency, and count vector, and three various machine learning and deep learning classifiers were used (Random Forest, Support Vector Machine, and Convolutional Neural Networks) to identify the fake news dataset. The results showed that fake news with textual content could be identified, especially when convolutional neural networks are used. According to the experimental results of the study, CNN performs better than the other models, with an F1-score of 95% and an accuracy of more than 91% percent. These findings indicate that machine learning methods can efficiently detect fake news in low-resource languages like Kurdish, even in complex environments.
Read full abstract