Comparison of different model's performances in task of document classification

Kristijan Spirovski,Zaneta Popeska,Andrea Kulakov,Goran Velinov,Evgenija Stevanoska

doi:10.1145/3227609.3227668

Abstract

Although the number of additional resources in Macedonian which can be used for solving information retrieval problem (or general Natural Language Processing problem) is very limited, models exist which are general enough and do not need additional knowledge about the language. This paper presents a document classification model, that doesn't rely on any language specific additional resources. The model is trained and tested on a set of news articles extracted from Macedonian websites, and each document is labeled with a class representing one of the twelve category sections from which the documents were extracted. The goal of this paper is to test different methods for feature selection and choice of vocabulary. Furthermore, we choose a model which gives the best accuracy for document classification task and we make sensitivity analysis on its architecture in order to further improve its performance. Although similar research already exists, this paper aims to combine different experiments and test them on Macedonian language documents. The models used in this paper are Random Forest (RF), Support Vector Machines (SVM) and Neural Network (NN). The performed experiments showed that the best accuracy is achieved when each document is represented as tf-idf vector, the vocabulary contains equal number of representative words from each class, and simple Neural Network with 3 hidden layers is used as a model. The main conclusion is that a language independent model for solving document classification problem can be successfully build for Macedonian language, achieving around 80% accuracy on the test set.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of different model's performances in task of document classification

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Analysis of Cross-Combinations of Feature Selection and Machine-Learning Classification Methods Based on [18F]F-FDG PET/CT Radiomic Features for Metabolic Response Prediction of Metastatic Breast Cancer Lesions.
Ober Van Gómez ... José Manuel Udías
Cancers | VOL. 14
Ober Van Gómez, et. al.Ober Van Gómez ... José Manuel Udías
14 Jun 2022
Cancers | VOL. 14

Landslide susceptibility assessment using feature selection-based machine learning models
...
Geomechanics and Engineering | VOL. 25
, et. al. ...
01 Jan 2020
Geomechanics and Engineering | VOL. 25

Classification of pulmonary lesion based on multiparametric MRI: utility of radiomics and comparison of machine learning methods.
Xinhui Wang ... Houjin Chen
European Radiology | VOL. 30
Xinhui Wang, et. al.Xinhui Wang ... Houjin Chen
28 Mar 2020
European Radiology | VOL. 30

Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods
Ali Ebrahimi ... Uffe Kock Wiil
BMC Medical Informatics and Decision Making | VOL. 22
Ali Ebrahimi, et. al.Ali Ebrahimi ... Uffe Kock Wiil
23 Nov 2022
BMC Medical Informatics and Decision Making | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of different model's performances in task of document classification

Abstract

Talk to us

Similar Papers