Abstract

Text similarity is an effective metric for estimating the text matching degree between two or more texts. Vector Space Model (VSM) is employed for the text similarity calculation in most instances. However, it is insufficient and ill-suited to solve the complex tasks since the high calculation dimension and computational complexity. Therefore, it is crucial to calculate the similarity of two news text, so that whether two reported news is the identical event or the same type of information would be ascertained. According to the analysis of the news reports, five basic factors in terms of “when”, “where”, “what”, “why”, and “who” are taken into account for distinguishing a news report. By analyzing these features, in this study, a method to calculate the similarity of news text is proposed. The proposed method fully integrates the influence of the five news feature words into the evaluation of text similarity, which avoids the problem happened in the text interference and computational efficiency to a large extent. There are four steps to execute the proposed method, i.e. extraction of the news elements, classification of these elements, calculation of the similarity, and comparison with available literatures. Experimental results suggest that our proposal outperforms the vector space cosine coefficient method, Jaccard coefficient method and entropy method in terms of the time complexity and computational accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.