Classification of News Texts from Different Languages with Machine Learning Algorithms

Sidar Ağduk,Emrah Aydemi̇r,Ayfer Polat

doi:10.55195/jscai.1311380

Sidar Ağduk, Emrah Aydemi̇r + Show 1 more

Open Access

https://doi.org/10.55195/jscai.1311380

Copy DOI

Abstract

As a result of the developments in technology, the internet is accepted as one of the most important sources of information today. Although it is possible to access a large number of data in a short time thanks to the Internet, it is critical to analyze this data correctly. The need for text mining is increasing day by day by processing and analyzing the increasingly irregular text type data in the digital environment and classifying them in a meaningful way. In this study, news texts obtained from online German, Spanish, English and Turkish news sites were separated according to predetermined world, sports, economy and politics categories. The data set consisting of 4000 news texts was classified using 41 different machine learning algorithms in the Weka program. The highest successful classification was obtained with Naive Bayes Multinominal and Naive Bayes Multinominal Updateable algorithms, and 93.5% for German news texts, 93.3% for English news texts, 82.8% for Spanish news texts and 88.8% for Turkish news texts.

Full Text