Breaking news: Unveiling a new dataset for Portuguese news classification and comparative analysis of approaches.

Klaifer Garcia,Lilian Berton,Pedro Shiguihara

doi:10.1371/journal.pone.0296929

Abstract

Every day thousands of news are published on the web and filtering tools can be used to extract knowledge on specific topics. The categorization of news into a predefined set of topics is a subject widely studied in the literature, however, most works are restricted to documents in English. In this work, we make two contributions. First, we introduce a Portuguese news dataset collected from WikiNews an open-source media that provide news from different sources. Since there is a lack of datasets for Portuguese, and an existing one is from a single news channel, we aim to introduce a dataset from different news channels. The availability of comprehensive datasets plays a key role in advancing research. Second, we compare different architectures for Portuguese news classification, exploring different text representations (BoW, TF-IDF, Embedding) and classification techniques (SVM, CNN, DJINN, BERT) for documents in Portuguese, covering classical methods and current technologies. We show the trade-off between accuracy and training time for this application. We aim to show the capabilities of available algorithms and the challenges faced in the area.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Breaking news: Unveiling a new dataset for Portuguese news classification and comparative analysis of approaches.

Abstract

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Journal: PloS one	Publication Date: Jan 26, 2024
License type: CC BY 4.0

Similar Papers

X-News dataset for online news categorization
Samia Nawaz Yousafzai ... Robertas Damaševičius
International Journal of Intelligent Computing and Cybernetics | VOL. 17
Samia Nawaz Yousafzai, et. al.Samia Nawaz Yousafzai ... Robertas Damaševičius
13 Aug 2024
International Journal of Intelligent Computing and Cybernetics | VOL. 17

How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework
Hyoungil Jeong ... Jungyun Seo
Expert Systems With Applications | VOL. 60
Hyoungil Jeong, et. al.Hyoungil Jeong ... Jungyun Seo
10 May 2016
Expert Systems With Applications | VOL. 60

Survey on supervised machine learning techniques for automatic text classification
Ammar Ismael Kadhim
Artificial Intelligence Review | VOL. 52
Ammar Ismael KadhimAmmar Ismael Kadhim
19 Jan 2019
Artificial Intelligence Review | VOL. 52

Research on Text Classification based on Deep Learning
Bo He ... Huanli Zhang
Scientific Journal of Technology | VOL. 4
Bo He, et. al.Bo He ... Huanli Zhang
20 Jul 2022
Scientific Journal of Technology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Breaking news: Unveiling a new dataset for Portuguese news classification and comparative analysis of approaches.

Abstract

Talk to us

Similar Papers

More From: PloS one