Abstract

With the development of information technology, Web news has become the main way of information dissemination. Web news topic discovery is useful for users to quickly find valuable information and its research is constantly improved. Traditional topic discovery research is based on vector space model, but it has the defects such as high dimension and data sparsity. However, the latent semantic analysis can map the high-dimensional and sparse words to k-dimensional semantic space and improve the similarity of the news of the same topic by the semantic correlation between words. In this paper, Web news topic discovery is studied. First, the set of Web news text is vectored and the weight of each feature in the texts is calculated by improved TFIDF. After the original text vector set is analysed by latent semantic analysis, the semantic relation is fully exploited between the texts and the words, and the news topics are extracted by clustering approach. For the extraction of sub-topics, the co-occurrence of words is used to display the sub-topics. In essence, the sub-topic vector is established through these co-occurrence words. The experimental results show that the proposed method can effectively capture the current hot topics of Web news and related sub-topics. It is meaningful for the technology of information retrieval and data mining.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.