Abstract

Online and offline newspaper articles have become an integral phenomenon to our society. News articles have a significant impact on our personal and social activities but picking a piece of an appropriate news article is a challenging task for users from the ocean of sources. Recommending the appropriate news category helps find desired articles for the readers but categorizing news article manually is laborious, sluggish and expensive. Moreover, it gets more difficult when considering a resource-insufficient language like Bengali which is the fourth most spoken language of the world. However, very few approaches have been proposed for categorizing Bangla news articles where few machine learning algorithms were applied with limited resources. In this paper, we accentuate multiple machine learning approaches including a neural network to categorize Bangla news articles for two different datasets. News articles have been collected from the popular Bengali newspaper Prothom Alo to build Dataset I and dataset II has been gathered from the famous machine learning competition platform Kaggle. We develop a modified stop-word set and apply it in the preprocessing stage which leads to significant improvement in the performance. Our result shows that the Multi-layer Neural network, Naïve Bayes and support vector machine provide better performance. Accuracy of 94.99%, 94.60%, 95.50% has been achieved for SVM, Logistic regression and Multi-layer dense Neural network, respectively.

Highlights

  • A newspaper is known as a powerhouse of information

  • Several approaches have been proposed for news categorization for different languages, i.e. Indonesian [4], Hindi[5], Arabic[6][11], Spanish [7], and these approaches mainly based on traditional machine learning algorithms such as Naïve Bayes, decision tree, K-Nearest Neighbors etc

  • The accuracy of 92.63% and 95.50% for dataset I and dataset II was achieved for the multi-layer dense neural network, respectively

Read more

Summary

INTRODUCTION

A newspaper is known as a powerhouse of information. People get the latest information about their desired content through online or offline newspapers. Online news websites provide subject categories and sub-categories [1] which significantly vary newspaper to newspaper. Frameworks are available to notify the readers about news' on their desire categories, manually categorizing thousands of online Bangla news articles is challenging. Appropriate categorization of Bangla news articles considering their content is essential for the readers and designing an automated system for this purpose is a crying need. Several approaches have been proposed for news categorization for different languages, i.e. Indonesian [4], Hindi[5], Arabic[6][11], Spanish [7], and these approaches mainly based on traditional machine learning algorithms such as Naïve Bayes, decision tree, K-Nearest Neighbors etc.

RELATED WORK
METHODOLOGY
Data Collection
Data Pre-processing
Feature Selection and Extraction
Splitting Dataset into Training and Testing Set
Building and Fitting Models
Accuracy of Model
Comparison of Algorithms
Findings
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.