TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction.

Ning Zhou,Wenqian Shi,Renyu Liang,Na Zhong,Syed Hassan Ahmed

doi:10.1155/2022/5649994

Ning Zhou, Wenqian Shi + Show 3 more

Open Access

https://doi.org/10.1155/2022/5649994

Copy DOI

Abstract

When TextRank algorithm based on graph model constructs graph associative edges, the co-occurrence window rules only consider the relationships between local terms. Using the information in the document itself is limited. In order to solve the above problems, an improved TextRank keyword extraction algorithm based on rough data reasoning combined with word vector clustering, RDD-WRank, was proposed. Firstly, the algorithm uses rough data reasoning to mine the association between candidate keywords, expands the search scope, and makes the results more comprehensive. Then, based on Wikipedia online open knowledge base, word embedding technology is used to integrate Word2Vec into the improved algorithm, and the word vector of TextRank lexical graph nodes is clustered to adjust the voting importance of nodes in the cluster. Compared with the traditional TextRank algorithm and the Word2Vec algorithm combined with TextRank, the experimental results show that the improved algorithm has significantly improved the extraction accuracy, which proves that the idea of using rough data reasoning can effectively improve the performance of the algorithm to extract keywords.

Highlights

In this information age, people’s lives are full of information
Improved Algorithm Using Word Vector Based on Rough Data-Deduction e classic TextRank algorithm constructs the graph model of candidate keywords through the co-occurrence relationship and iteratively calculates the weight of each node through the average transition probability matrix until it converges. is approach is relatively simple and effective, but it has certain limitations. e rule of co-occurrence window only considers the correlation between local words, so some words that are locally associated with certain keywords may be extracted
Experimental Data. e experiment selected the Wikipedia Chinese corpus released in February 2020 “zhwiki-20200201-pages-articles-multistream.xml.bz2” to train Chinese word vectors [43, 44], which contains a main file of 1.9CB

Summary

Introduction

People’s lives are full of information. Faced with such a huge amount of data, it is important to quickly and accurately obtain the content which we are interested in and which is valuable. In order to further improve the keyword extraction effect of the TextRank algorithm, Literature [18] proposed PositionRank, an unsupervised model for extracting keywords from academic documents, which combines information of all locations where words appear to bias PageRank. Literature [28] proposed a cuckoo search algorithm and k-means supervised hybrid clustering algorithm to divide all kinds of data samples into clusters so as to provide training subsets with high diversity and merged the word2vec model into the traditional TextRank algorithm by using word embedding technology to improve the accuracy of keyword extraction. Literature [29] merged the word2vec model into the traditional TextRank algorithm by using word embedding technology to improve the accuracy of keyword extraction

Research Theory

Rough Data-Deduction

Experimental Data and Evaluation Criteria

Experimental Results and Analysis

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Intelligence and Neuroscience	Publication Date: Jan 25, 2022
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience

Lead the way for us

Similar Papers

A Richer Vocabulary of Chinese Personality Traits: Leveraging Word Embedding Technology for Mining Personality Descriptors.
Yigang Ding ... Feijun Zheng
Journal of psycholinguistic research | VOL. 53
Yigang Ding, et. al.Yigang Ding ... Feijun Zheng
25 Mar 2024
Journal of psycholinguistic research | VOL. 53

Enhanced Word Embedding Method in Text Classification
Shengze Hu ... Fang Liu
-
Shengze Hu, et. al.Shengze Hu ... Fang Liu
01 Dec 2020
01 Dec 2020

Embedding Compression with Right Triangle Similarity Transformations
Haohao Song ... Dongsheng Zou
-
Haohao Song, et. al.Haohao Song ... Dongsheng Zou
01 Jan 2020
01 Jan 2020

Multi-Attention Mechanism Medical Image Segmentation Combined with Word Embedding Technology
Junlong Cheng ... Hongfeng You
Automatic Control and Computer Sciences | VOL. 54
Junlong Cheng, et. al. Junlong Cheng ... Hongfeng You
01 Nov 2020
Automatic Control and Computer Sciences | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience