Abstract

Nowadays, many applications that use large data have been developed due to the existence of the Internet of Things. These applications are translated into different languages and require automated text classification (ATC). The ATC process depends on the content of one or more predefined classes. However, this process is problematic for the Arabic translation of the data. This study aims to solve this issue by investigating the performances of three classification algorithms, namely, k-nearest neighbor (KNN), decision tree (DT), and naïve Bayes (NB) classifiers, on Saudi Press Agency datasets. Results showed that the NB algorithm outperformed DT and KNN algorithms in terms of precision, recall, and F1. In future works, a new algorithm that can improve the handling of the ATC problem will be developed.

Highlights

  • Given the increasing global utilization of the Internet of Things, relevant data are being translated into different languages (e.g., English, French, and Arabic)

  • The main goal of this study is to present and investigate results achieved against Arabic text collections using naïve Bayes (NB), k-nearest neighbor (KNN), and decision tree algorithms

  • Three well-known data mining algorithms, namely Decision tree, KNN, and NB algorithms are used to classify 1562 Arabic articles collected from Saudi Press Agency (SPA) (Al-Harbi, Almuhareb & Al-Thubaity,2008), SPA datasets are categorized into six classes: Culture news,"‫ "اخبار ثقافية‬Sport news ‫"اخبار‬,"‫رياضية‬Social news,"‫ "اخبار إجتماعية‬Economics news,"‫ "اخبار إقتصادية‬Political news,"‫ "اخبار سياسية‬and General news."‫"اخبار عامة‬

Read more

Summary

Introduction

Given the increasing global utilization of the Internet of Things, relevant data are being translated into different languages (e.g., English, French, and Arabic). The results obtained through Rocchio and KNN algorithms are similar Both algorithms outperform the C4.5 algorithm in terms of recall and precision measures (Sallam, Mousa, and Hussein, 2016) proposed automated Arabic text classification approach uses frequency ratio accumulation method (FRAM), and evaluated on three different Arabic datasets. Three associative classification prediction methods, namely, full match rule, dominant class label, and average confidence per class, were tested and evaluated by ( Thabtah et al, 2011) by using Reuters and Saudi Press Agency (SPA) dataset They compared the three methods with SVMs, KNN, MCAR, NB, and C4.5 algorithms. The comparison results indicated that the SVM classifier outperformed the NB classifier in terms of recall, precision, and F1

Proposed Algorithms
Decision Tree Algorithm
Experiments Results
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.