Efficient text feature extraction by integrating the average linkage and K-medoids clustering

Dasong Sun

doi:10.1142/s0217984921501517

Abstract

By clustering feature words, we can not only simplify the dimension of feature subsets, but also eliminate the redundancy of the feature. However, for a feature set with very large dimensions, the traditional [Formula: see text]-medoids algorithm is difficult to accurately estimate the value of [Formula: see text]. Moreover, the clustering results of the average linkage (AL) algorithm cannot be divided again, and the AL algorithm cannot be directly used for text classification. In order to overcome the limitations of AL and [Formula: see text]-medoids, in this paper, we combine the two algorithms together so as to be mutually complementary to each other. In particular, in order to meet the purpose of text classification, we improve the AL algorithm and propose the [Formula: see text] testing statistics to obtain the approximate number of clusters. Finally, the central feature words are preserved, and the other feature words are deleted. The experimental results show that the new algorithm largely eliminates the redundancy of the feature. Compared with the traditional TF-IDF algorithms, the performance of the text classification of the new algorithm is improved.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient text feature extraction by integrating the average linkage and K-medoids clustering

Abstract

Talk to us

Similar Papers

More From: Modern physics letters. B, Condensed matter physics, statistical physics, applied physics

Lead the way for us

Journal: Modern physics letters. B, Condensed matter physics, statistical physics, applied physics	Publication Date: Feb 17, 2021
Citations: 3

Similar Papers

Portfolio Selection Based on Hierarchical Clustering and Inverse-Variance Weighting
Andrés Arévalo ... German Hernandez
-
Andrés Arévalo, et. al.Andrés Arévalo ... German Hernandez
01 Jan 2019
01 Jan 2019

Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling
Md Habibur Rahman ... M A Matin
Theoretical and applied climatology | VOL. 134
Md Habibur Rahman, et. al.Md Habibur Rahman ... M A Matin
01 Dec 2017
Theoretical and applied climatology | VOL. 134

Seismic facies classification using 2‐D and 3‐D multi‐attribute hierarchical clustering algorithms
Hamid Sabeti ... Babak Nadjar
-
Hamid Sabeti, et. al.Hamid Sabeti ... Babak Nadjar
01 Jan 2010
01 Jan 2010

수정된 ALA 클러스터링 알고리즘을 이용한 손실된 움직임 벡터 복원 방법
Nam-Rye Son ... Guee-Sang Lee
The KIPS Transactions:PartB | VOL. 12B
Nam-Rye Son, et. al.Nam-Rye Son ... Guee-Sang Lee
01 Dec 2005
The KIPS Transactions:PartB | VOL. 12B

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient text feature extraction by integrating the average linkage and K-medoids clustering

Abstract

Talk to us

Similar Papers

More From: Modern physics letters. B, Condensed matter physics, statistical physics, applied physics