Abstract

The internet provides a very vast amount of sources of news and the user has to search for desirable news by spending a lot of time because the user always prefers their related interest, desirable and informative news. The clustering of the news article having a great impact on the preferences of the user. The unsupervised learning techniques such that K-means Clustering and Spectral Clustering are proposed to categorize the news articles by extracting discriminant features that help the user to search and get informative news without wasting time. The BBC news articles dataset is used to perform experiments that consist of 2225 news articles. The TF-IDF feature extraction technique is used with K-means clustering and Spectral clustering to get the most similar clusters to categorize the news articles in respective domains. Those domains are sports, tech, entertainment, politics, and business. The clustering algorithms are evaluated using adjusted rand index, V-measure, homogeneity score, completeness score, and Fowlkes mallows score. The experimental results illustrated that K-means clustering performs better than spectral clustering using the TF-IDF feature extraction approach. But to improve the results the canopy centroid selection is used with the grid search optimization technique to optimize the results of the Kmeans and named its as a K-Means using Grid Search based on Canopy (KMGC-Search). The experimental results shows the proposed approach can be used as a viable method for the categorization of news articles.

Highlights

  • The news leaves a great impact [1] on the thoughts of the people because news presents those things of the world that are hidden from the local people [2]

  • The highest Completeness score (CS) score of 84% is gain with five clusters using TFIDF features with the K-means clustering model that presents in Figure 4 and by using spectral clustering 79% CS score is gained

  • The categorization of the news articles based on the clustering algorithm with the Term Frequency-Inverse Document Frequency (TF-Inverse Document Frequency (IDF)) feature extraction technique and hyperparameter tuning makes the user get the desirable, valuable, and informative news

Read more

Summary

Introduction

The news leaves a great impact [1] on the thoughts of the people because news presents those things of the world that are hidden from the local people [2]. The circle of the local user is not broad enough in the old days [4] to get important updates about the world. In this age, a local user is surrounded by mobile electronic devices [5] and digital media like the internet that has a vast amount of data in different domains [6] about events, politics, supports, business, technology, etc. The news organizations feel the impact of the internet news on the perspective of the user and it attracts the user more than the televisions broadcast or the newspaper

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call