Turkish Text Classification Research Articles

Today, extreme amounts of data are produced, and this is commonly referred to as Big Data. A significant amount of big data is composed of textual data, and as such, text processing has correspondingly increased in its importance. This is especially valid to the development of word embedding and other groundbreaking advancements in this field. However, When studies on text processing and word embedding are examined, it can be seen that while there have been many world language-oriented studies, especially for the English language, there has been an insufficient level of study undertaken specific to the Turkish language. As a result, Turkish was chosen as the target language for the current study. Two Turkish datasets were created for this study. Word vectors were trained using the Word2Vec method on an unlabeled large corpus of approximately 11 billion words. Using these word vectors, text classification was applied with deep neural networks on a second dataset of 1.5 million examples and 10 classes. The current study employed the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) methods – other types of this architecture – and their variations as deep neural network architectures. The performances of the embedding methods for the words used in this study, their effects on the rate of accuracy, and the success of the deep neural network architectures were then analyzed in detail. When studying the experimental results, it was determined that the GRU and LSTM methods were more successful compared to the other deep neural network models used in this study. The results showed that the pre-trained word vectors’ (PWVs) accuracy on deep neural networks improved at rates of approximately 5% and 7%. The datasets and word vectors of the current study will be shared in order to contribute to the Turkish language literature in this field.

Read full abstract

The Effect of Ensemble Learning Models on Turkish Text Classification Due to rapid development of the Internet and related technologies, the amount of text-based content generated through Internet applications is increasing from day to day. Since text-based content is unstructured, accessing and managing this data is almost impossible. Consequently, there is a need for automatic text classification process. Text mining is a discipline in the Data Mining field and offers algorithms in order to perform text classification. The main objective of text classification is forming a learning model by using a training data set with pre-defined categories and placing data with unknown categories into correct categories. Different text classification algorithms such as decision trees, Bayesian classifiers, rule-based classifiers, neural networks, k-nearest neighbor classifier, support vector machines and ensemble learning methods exist in the literature. In this study, the effect of ensemble learning models on Turkish text classification was evaluated. A publicly available data set named TTC-3600 which consists of 3600 news collected from 6 news portals was selected. Text classification process was performed on TTC-3600 data set by using 4 base classification algorithms Naive Bayes, Support Vector Machine, K-Nearest Neighbor, J48 Decision tree and their Boosting, Bagging and Rotation Forest ensemble learning models. The experimental results shows that ensemble learning models generally give more accurate results by increasing the success of base classifiers.

Read full abstract

Turkish Text Classification Research Articles

Articles published on Turkish Text Classification

Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization

An Analysis of Intelligent Turkish Text Classification Models for Routing Calls in Call Centers: A Case Study on the Republic of Turkiye Ministry of Trade Call Center

Relational Turkish Text Classification Using Distant Supervised Entities and Relations

Unified benchmark for zero-shot Turkish text classification

Improving automated Turkish text classification with learning‐based algorithms

MooDetecTR: Kelime Vektörleri Vasıtasıyla Türkçe Şarkı Sözleri için Ruh Hali Tespiti

Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification

Nitelik Çıkarımı Yöntemlerinin Türkçe Metinlerin Sınıflandırılmasına Etkisi

The Effect of Ensemble Learning Models on Turkish Text Classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Turkish Text Classification Research Articles

Articles published on Turkish Text Classification

Turkish Text Classification Based On Wrapper Feature Selection Using Particle Swarm Optimization

An Analysis of Intelligent Turkish Text Classification Models for Routing Calls in Call Centers: A Case Study on the Republic of Turkiye Ministry of Trade Call Center

Relational Turkish Text Classification Using Distant Supervised Entities and Relations

Unified benchmark for zero-shot Turkish text classification

Improving automated Turkish text classification with learning‐based algorithms

MooDetecTR: Kelime Vektörleri Vasıtasıyla Türkçe Şarkı Sözleri için Ruh Hali Tespiti

Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification

Nitelik Çıkarımı Yöntemlerinin Türkçe Metinlerin Sınıflandırılmasına Etkisi

The Effect of Ensemble Learning Models on Turkish Text Classification