Automatic Keyword Extraction Algorithm for Chinese Text based on Word Clustering

Rui Pan

doi:10.1145/3592793

Abstract

There are some problems in automatic keyword extraction of Chinese text, such as large feature extraction error, low precision of extracted keywords, and poor real-time performance. Therefore, an automatic keyword extraction algorithm for Chinese text based on word clustering is designed. Calculate keyword frequency, document frequency and inverse document frequency features through statistical algorithm, measure the degree of interdependence between keywords with the help of point mutual information, and construct keyword feature item quantification matrix with the help of vector space model corresponding to keywords and feature items to complete keyword feature quantification and realize keyword feature extraction of Chinese text. Calculate the average semantic similarity of keyword words, determine the similarity of keyword features, and eliminate the keyword features with high similarity; Set the comprehensive feature value of the importance of single word words in Chinese text, determine the importance of single word words in the text, remove the single word words with low importance, and use Bayesian framework to reduce the dimension of high-dimensional keyword feature data to realize preprocessing research. The mapping results of keyword vector space model are determined by word clustering algorithm, the text clusters of keyword space clustering results are calculated by clustering algorithm, and the keywords are classified by DBN method. On this basis, the automatic keyword extraction model of Chinese text is designed to realize the automatic keyword extraction of Chinese text. The experimental results show that the design algorithm can effectively reduce the feature extraction error and improve the extraction efficiency.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Keyword Extraction Algorithm for Chinese Text based on Word Clustering

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Apr 22, 2023
Citations: 2

Similar Papers

The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.
Elham Najafi ... Francisco J. Esteban
PloS one | VOL. 10
Elham Najafi, et. al.Elham Najafi ... Francisco J. Esteban
19 Jun 2015
PloS one | VOL. 10

Cluster-based Unsupervised Automatic Keyphrases Extraction algorithms: experimentations on Cultural Heritage datasets
Maria Teresa Artese ... Isabella Gagliardi
Archiving Conference | VOL. 16
Maria Teresa Artese, et. al.Maria Teresa Artese ... Isabella Gagliardi
14 May 2019
Archiving Conference | VOL. 16

Topic-based automatic summarization algorithm for Chinese short text.
Tinghuai Ma ... Najla Al-Nabhan
Mathematical Biosciences and Engineering | VOL. 17
Tinghuai Ma, et. al.Tinghuai Ma ... Najla Al-Nabhan
01 Jan 2020
Mathematical Biosciences and Engineering | VOL. 17

Automatic intonation-based keyword extraction from academic discourse
Natalia Bogach ... Yurij Lezhenin
-
Natalia Bogach, et. al.Natalia Bogach ... Yurij Lezhenin
26 Sep 2018
26 Sep 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Keyword Extraction Algorithm for Chinese Text based on Word Clustering

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing