A Text Similarity Measurement Employs Semantic Dictionary-based Sentiment Analysis

Turdi Tohti,Shasha Li,Askar Hamdulla

doi:10.1109/ialp54817.2021.9675226

Abstract

In the age of big data, obtaining demanded information becomes more and more difficult due to continuous emergence of text information and the increasing number of repeated text data. It is even more urgent to use text similarity algorithms to solve the current dilemma. The classification method based on word features is often used as a method for calculating sentence similarity, but it is difficult to obtain accurate results without considering emotional factors. On the other hand, sentence-level similarity calculation and sentiment analysis are also difficult to guarantee to accurately measure the similarity between texts. This work not only considers text similarity based on word features, but also considers text similarity based on emotional tendency. Firstly, the word vector and the Chinese thesaurus are used to measure the text similarity at the word level, and then sentiment analysis is performed at the sentence and paragraph levels based on sentiment lexicon, and finally, according to the above two kinds of similarity to draw a conclusion whether the texts are similar. In the experiment and analysis, several mainstream similarity methods are tested and compared to verify the accuracy, stability and efficiency of the proposed method.

Full Text