Abstract
Nowadays, many applications that use large data have been developed due to the existence of the Internet of Things. These applications are translated into different languages and require automated text classification (ATC). The ATC process depends on the content of one or more predefined classes. However, this process is problematic for the Arabic translation of the data. This study aims to solve this issue by investigating the performances of three classification algorithms, namely, k-nearest neighbor (KNN), decision tree (DT), and naïve Bayes (NB) classifiers, on Saudi Press Agency datasets. Results showed that the NB algorithm outperformed DT and KNN algorithms in terms of precision, recall, and F1. In future works, a new algorithm that can improve the handling of the ATC problem will be developed.
Highlights
Given the increasing global utilization of the Internet of Things, relevant data are being translated into different languages (e.g., English, French, and Arabic)
The main goal of this study is to present and investigate results achieved against Arabic text collections using naïve Bayes (NB), k-nearest neighbor (KNN), and decision tree algorithms
Three well-known data mining algorithms, namely Decision tree, KNN, and NB algorithms are used to classify 1562 Arabic articles collected from Saudi Press Agency (SPA) (Al-Harbi, Almuhareb & Al-Thubaity,2008), SPA datasets are categorized into six classes: Culture news," "اخبار ثقافيةSport news "اخبار,"رياضيةSocial news," "اخبار إجتماعيةEconomics news," "اخبار إقتصاديةPolitical news," "اخبار سياسيةand General news.""اخبار عامة
Summary
Given the increasing global utilization of the Internet of Things, relevant data are being translated into different languages (e.g., English, French, and Arabic). The results obtained through Rocchio and KNN algorithms are similar Both algorithms outperform the C4.5 algorithm in terms of recall and precision measures (Sallam, Mousa, and Hussein, 2016) proposed automated Arabic text classification approach uses frequency ratio accumulation method (FRAM), and evaluated on three different Arabic datasets. Three associative classification prediction methods, namely, full match rule, dominant class label, and average confidence per class, were tested and evaluated by ( Thabtah et al, 2011) by using Reuters and Saudi Press Agency (SPA) dataset They compared the three methods with SVMs, KNN, MCAR, NB, and C4.5 algorithms. The comparison results indicated that the SVM classifier outperformed the NB classifier in terms of recall, precision, and F1
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.