Application of Naïve Bayes, Decision Tree, and K-Nearest Neighbors for Automated Text Classification

Jafar Ababneh

doi:10.5539/mas.v13n11p31

Abstract

Nowadays, many applications that use large data have been developed due to the existence of the Internet of Things. These applications are translated into different languages and require automated text classification (ATC). The ATC process depends on the content of one or more predefined classes. However, this process is problematic for the Arabic translation of the data. This study aims to solve this issue by investigating the performances of three classification algorithms, namely, k-nearest neighbor (KNN), decision tree (DT), and na&iuml;ve Bayes (NB) classifiers, on Saudi Press Agency datasets. Results showed that the NB algorithm outperformed DT and KNN algorithms in terms of precision, recall, and F1. In future works, a new algorithm that can improve the handling of the ATC problem will be developed.

Highlights

Given the increasing global utilization of the Internet of Things, relevant data are being translated into different languages (e.g., English, French, and Arabic)
The main goal of this study is to present and investigate results achieved against Arabic text collections using naïve Bayes (NB), k-nearest neighbor (KNN), and decision tree algorithms
Three well-known data mining algorithms, namely Decision tree, KNN, and NB algorithms are used to classify 1562 Arabic articles collected from Saudi Press Agency (SPA) (Al-Harbi, Almuhareb & Al-Thubaity,2008), SPA datasets are categorized into six classes: Culture news,"‫ "اخبار ثقافية‬Sport news ‫"اخبار‬,"‫رياضية‬Social news,"‫ "اخبار إجتماعية‬Economics news,"‫ "اخبار إقتصادية‬Political news,"‫ "اخبار سياسية‬and General news."‫"اخبار عامة‬

Summary

Introduction

Given the increasing global utilization of the Internet of Things, relevant data are being translated into different languages (e.g., English, French, and Arabic). The results obtained through Rocchio and KNN algorithms are similar Both algorithms outperform the C4.5 algorithm in terms of recall and precision measures (Sallam, Mousa, and Hussein, 2016) proposed automated Arabic text classification approach uses frequency ratio accumulation method (FRAM), and evaluated on three different Arabic datasets. Three associative classification prediction methods, namely, full match rule, dominant class label, and average confidence per class, were tested and evaluated by ( Thabtah et al, 2011) by using Reuters and Saudi Press Agency (SPA) dataset They compared the three methods with SVMs, KNN, MCAR, NB, and C4.5 algorithms. The comparison results indicated that the SVM classifier outperformed the NB classifier in terms of recall, precision, and F1

Proposed Algorithms

Decision Tree Algorithm

Experiments Results

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Modern Applied Science	Publication Date: Oct 6, 2019
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Application of Naïve Bayes, Decision Tree, and K-Nearest Neighbors for Automated Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Modern Applied Science

Lead the way for us

Similar Papers

An Effective Approach to Detect Liver Disorder using KNN Algorithm in Comparison with Decision Tree Algorithm to Measure Accuracy
M.M Zaheer ... P Nirmala
CARDIOMETRY | VOL. -
M.M Zaheer, et. al.M.M Zaheer ... P Nirmala
14 Feb 2023
CARDIOMETRY | VOL. -

A Comparison of Classification Algorithms Based on The Number of Features
Peichen Xiong ... Wei Ping
-
Peichen Xiong, et. al.Peichen Xiong ... Wei Ping
01 Jul 2020
01 Jul 2020

KNNTree: A New Method to Ameliorate K-Nearest Neighbour Classification using Decision Tree
Niful Islam ... Dewan Md Farid
-
Niful Islam, et. al.Niful Islam ... Dewan Md Farid
23 Feb 2023
23 Feb 2023

Decision Tree and Random Forest Classification Algorithms for Mangrove Forest Mapping in Sembilang National Park, Indonesia
Anang Dwi Purwanto ... Ketut Wikantika
Remote Sensing | VOL. 15
Anang Dwi Purwanto, et. al.Anang Dwi Purwanto ... Ketut Wikantika
21 Dec 2022
Remote Sensing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of Naïve Bayes, Decision Tree, and K-Nearest Neighbors for Automated Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Modern Applied Science