Improved TF-IDF for We Media Article Keywords Extraction

Xinxin Guan,Hechen Gong,Yeli Li

doi:10.1088/1742-6596/1302/3/032003

Xinxin Guan, Hechen Gong + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1302/3/032003

Copy DOI

Abstract

Keyword extraction is one of the work of computer text topic mining, and it is also the basis of text analysis and public opinion analysis. The keywords extracted by the traditional TF-IDF algorithm are mainly calculated based on the word frequency. The importance of other feature words with fewer occurrences and the comments of readers below the article are not considered. Aiming at the above problems, this paper improves the traditional TF-IDF algorithm, adds the part of speech and the reader’s comment as the impact factor, and recalculates the weight of TF-IDF, so that the accuracy of the algorithm is improved. This paper uses the Python language programming to crawl from the media article and implement the improvement of the algorithm. Experiments show that the improved TF-IDF algorithm has significantly improved compared with the traditional TF-IDF, in terms of accuracy, recall rate, F1, MacAvg_P, MacAvg_R and MacAvg_F1.

Full Text