Abstract

Keyword extraction is one of the work of computer text topic mining, and it is also the basis of text analysis and public opinion analysis. The keywords extracted by the traditional TF-IDF algorithm are mainly calculated based on the word frequency. The importance of other feature words with fewer occurrences and the comments of readers below the article are not considered. Aiming at the above problems, this paper improves the traditional TF-IDF algorithm, adds the part of speech and the reader’s comment as the impact factor, and recalculates the weight of TF-IDF, so that the accuracy of the algorithm is improved. This paper uses the Python language programming to crawl from the media article and implement the improvement of the algorithm. Experiments show that the improved TF-IDF algorithm has significantly improved compared with the traditional TF-IDF, in terms of accuracy, recall rate, F1, MacAvg_P, MacAvg_R and MacAvg_F1.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.