Abstract

In order to overcome the shortcomings of the incomprehensive of traditional keyword extraction, this paper proposes a keyword extraction based on multi-feature fusion for Chinese web pages. First, the part-of-speech and the position information of candidate words are combined in the improved TF-IDF algorithm. Second, the mutual information of the web title is taken into account to calculate the weight of candidate words. Third, the multi-feature fusion technology is formed by the linear combination of the improved TF-IDF method and mutual information. Thus, our method is proposed based on this multi-feature fusion technology for keyword extraction. Comparative experiments show that extracting keywords generated by our method has higher precision and recall compared with the classical TF-IDF algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call