Abstract

Text similarity is one of the important methods of text data analysis, which is often used in text clustering and classification. Social media is a new and popular online social application that contains a lot of valuable information. Short text is common in social media, and short text similarity is often used for social media data mining. The similarity calculation of short text is influenced by the small feature of text words and the accuracy is low. so it is a common improvement method to calculate the similarity of short texts with word semantic similarity. This paper put forward a short text semantic similarity calculation method that combine knowledge-based method and corpus-based method. This method is based on the improved word semantic similarity calculation method and general short text semantic similarity calculation method. The word similarity calculation method combines two word semantic similarity by some strategies. It takes the advantages of two methods to overcome the disadvantages of single one, finds out more semantic association among words in texts, and improves accuracy of word similarity calculation. This paper uses a large number of corpus to compare and analyze several word and text semantic similarity algorithms, the improved method has a closer result to human ratings than other methods in both word and text similarity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.