Clustering Generalised Instances Set Approaches for Text Classification

Hassan Najadat,Rasha Obeidat,Ismail Hmeidi

doi:10.1142/s0219649211002857

Abstract

This paper introduces three new text classification methods: Clustering-Based Generalised Instances Set (CB-GIS), Multilevel Clustering-Based Generalised Instances Set (MLC_GIS) and Multilevel Clustering-Based, k Nearest Neighbours (MLC-kNN). These new methods aim to unify the strengths and overcome the drawbacks of the three similarity-based text classification methods, namely, kNN, centroid-based and GIS. The new methods utilise a clustering technique called spherical K-means to represent each class by a representative set of generalised instances to be used later in the classification. The CB-GIS method applies a flat clustering method while MLC-GIS and MLC-kNN apply multilevel clustering. Extensive experiments have been conducted to evaluate the new methods and compare them with kNN, centroid-based and GIS classifiers on the Reuters-21578(10) benchmark dataset. The evaluation has been performed in terms of the classification performance and the classification efficiency. The experimental results show that the top-performing classification method is the MLC-kNN classifier, followed by the MLC-GIS and CB-GIS classifiers. According to the best micro-averaged F1 scores, the new methods (CB-GIS, MLC-CIS, MLC-kNN) have improvements of 4.48%, 4.65% and 4.76% over kNN, 1.84%, 1.92% and 2.12% over the centroid-based and 5.26%, 5.34% and 5.45% over GIS respectively. With respect to the best macro-averaged F1 scores, the new methods (CB-GIS, MLC-CIS, MLC-kNN) have improvements of 10.29%, 10.19% and 10.45% over kNN, respectively, 0.1%, 0.03% and 0.29% over the centroid-based and 3.75%, 3.68% and 3.94% over GIS respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clustering Generalised Instances Set Approaches for Text Classification

Abstract

Talk to us

Similar Papers

More From: Journal of Information & Knowledge Management

Lead the way for us

Journal: Journal of Information & Knowledge Management	Publication Date: Mar 1, 2011
Citations: 11

Similar Papers

Applying feature-similarity-metrics for long-tailed problem of phytoplankton microscopic images classification
Tianhong Liang ... Peng Huang
Algal Research | VOL. 82
Tianhong Liang, et. al.Tianhong Liang ... Peng Huang
01 Aug 2024
Algal Research | VOL. 82

Confidence interval for micro-averaged F1 and macro-averaged F1 scores
Kanae Takahashi ... Tatsuki Koyama
Applied intelligence (Dordrecht, Netherlands) | VOL. 52
Kanae Takahashi, et. al.Kanae Takahashi ... Tatsuki Koyama
31 Jul 2021
Applied intelligence (Dordrecht, Netherlands) | VOL. 52

A Bayesian Hierarchical Model for Comparing Average F1 Scores
Dell Zhang ... Jun Wang
-
Dell Zhang, et. al.Dell Zhang ... Jun Wang
01 Nov 2015
01 Nov 2015

Hybrid CNN-SVM Approach with Regularization for Accurate Classification of Images: A Case Study on Rudraksha Beads
Deepak Banerjee ... Vishal Jain
-
Deepak Banerjee, et. al.Deepak Banerjee ... Vishal Jain
11 May 2023
11 May 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering Generalised Instances Set Approaches for Text Classification

Abstract

Talk to us

Similar Papers

More From: Journal of Information &amp; Knowledge Management

More From: Journal of Information & Knowledge Management