News Article Text Classification in Indonesian Language

Rini Wongso,Ferdinand Ariandy Luwinda,Brandon Christian Trisnajaya,Olivia Rusli,Rudy Rudy

doi:10.1016/j.procs.2017.10.039

Rini Wongso, Ferdinand Ariandy Luwinda + Show 3 more

Open Access

https://doi.org/10.1016/j.procs.2017.10.039

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2017
Citations: 48	License type: cc-by-nc-nd

Affiliation: Binus University

Abstract

This research intends to find the appropriate algorithm to automatically classify a news article in Indonesian Language. We obtain our dataset which is taken by using a web crawling method from www.cnnindonesia.com. First of all, the document will first undergo some Text Preprocessing method in the form of Lemmatization and Stopwords Removal. The reason we are doing the Text Preprocessing step before anything else is to minimize the noise in the document. Next, we apply Feature Selection onto the document to further separate important words and less important words inside the document. After applying Feature Selection, the document will be classified by the classifier. We are comparing the TF-IDF and SVD algorithm for feature selection, while also comparing the Multinomial Naïve Bayes, Multivariate Bernoulli Naïve Bayes, and Support Vector Machine for the Classifiers. Based on the test results, the combination of TF-IDF and Multinomial Naïve Bayes Classifier gives the highest result compared to the other algorithms, which precision is 0.9841519 and its recall is 0.9840000. The result outperform the previous similar study that classify news article in Indonesian language which obtained 85% of accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

News Article Text Classification in Indonesian Language

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

A document recommendation system of stemming and stopword removal impact: A web-based application
W G S Parwita
Journal of Physics: Conference Series | VOL. 1469
W G S ParwitaW G S Parwita
01 Feb 2020
Journal of Physics: Conference Series | VOL. 1469

Implementation of The Naïve Bayes Algorithm with Feature Selection using Genetic Algorithm for Sentiment Review Analysis of Fashion Online Companies
Siti Ernawati ... Eka Rini Yulia
-
Siti Ernawati, et. al.Siti Ernawati ... Eka Rini Yulia
01 Aug 2018
01 Aug 2018

Study of hoax news detection using naïve bayes classifier in Indonesian language
Inggrid Yanuar Risca Pratiwi ... Rosa Andrie Asmara
-
Inggrid Yanuar Risca Pratiwi, et. al.Inggrid Yanuar Risca Pratiwi ... Rosa Andrie Asmara
01 Oct 2017
01 Oct 2017

Identification of 10 Regional Indonesian Languages Using Machine Learning
Azhar Baihaqi Nugraha ... Ade Romadhony
sinkron | VOL. 8
Azhar Baihaqi Nugraha, et. al.Azhar Baihaqi Nugraha ... Ade Romadhony
01 Oct 2023
sinkron | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

News Article Text Classification in Indonesian Language

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science