Short Text Semantic Similarity Research Articles

In the area of natural language processing, measuring sentence similarity is an essential problem. Searching for semantic meaning in natural language is a related issue. The task of measuring sentence similarity is to find semantic symmetry in two sentences, not matter how they are arranged. It is important to measure the similarity of sentences accurately. To compute the similarity between sentences, existing methods have been constructed from approaches for large texts. Since these methods work in very high-dimensional spaces, they are inefficient, require human input, and are not flexible enough for some applications. In this study, we propose a hybrid method (HydMethod) which considers not only semantic information including lexical databases, word embeddings, and corpus statistics, but also implied word order information. With lexical databases, our method models human common sense knowledge, and that knowledge can then be adapted to be used in different domains with the incorporation of corpus statistics. Therefore, the methodology is applicable across several domains. As part of our experiments, we used two standard datasets - Pilot Short Text Semantic Similarity Benchmark and MS paraphrase - in order to demonstrate the efficacy of our proposed method. As a result, the proposed method outperforms the existing approaches when tested on these two datasets, giving the highest correlation value for both word and sentence similarity. Moreover, it achieves a maximum of 32% higher increase than only using word vector or WorldNet based methodology. With Rubenstein and Goodenough word & sentence pairs, our algorithm's similarity measure shows a high Pearson correlation coefficient of 0.8953.

Read full abstract

In order to improve the accuracy of short text similarity calculation, this paper presents the idea that use the history of short text messages to construct semantic feature space, then use the vector in semantic feature space to represent short text and do semantic extension, and finally calculate the short text similarity of corresponding vector in the semantic feature space. This method can represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. We selected a large number of problem test sets for experiments. The results show that the method we proposed is reasonable and effective. I. INTRODUCTION With the wide application of short text similarity calculation method in information retrieval, question- answering system, text mining and other natural language processing fields, the research and improvement on the calculation method of short text similarity has become an important research hotspot. The research finds that there are many differences between the calculation methods of short text similarity and document similarity. As the document contains large amount of word information, most of the similarity calculation method is based on word statistical method. However, the short text contains little word information, maybe even only one word. It is not sufficient to judge the similarity between the short texts accurately only using the information of the short text itself. Therefore, in order to improve the calculation accuracy of short text similarity, we need to solve two key problems. The first problem is how to fully expressed and reflected short text information? The information includes word frequency, word meaning, etc. The second problem is how to calculate the similarity between the short texts? In order to solve these two problems, this paper presents the calculation method of Chinese short text semantic similarity based on the semantic feature space. This method represent the semantic information of short text message thoroughly so as to improve the accuracy of similarity calculation. We selected a large number of problem test sets for experiments. The results show that the method we proposed is reasonable and effective. II. CONSTRUCTION METHOD OF SEMANTIC FEATURE SPACE We take the intelligent-service system as the research background. The main short texts in the system are advisory information (namely interrogative sentences) and response short texts. In the intelligent service system, there are many users asking for advices every day, which inevitably produces massive consultation information. We can use these historical advisory information, namely short text sets to construct the semantic feature space, and then build the model by using the new consultation of the users or questioning short text in the space, finally we can calculate the similarity between the new short text and historical short text. The semantic feature space has a similar construction process with the ordinary vector space, which also consists of two main steps: feature selection and feature dimension reduction.

Read full abstract

Short Text Semantic Similarity Research Articles

Related Topics

Articles published on Short Text Semantic Similarity

Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives

A novel hybrid methodology for computing semantic similarity between sentences through various word senses

Research on Semantic Similarity of Short Text Based on Bert and Time Warping Distance

A survey on the techniques, applications, and performance of short text semantic similarity

A Semantic and Syntactic Similarity Measure for Political Tweets

Learning short-text semantic similarity with word embeddings and external knowledge sources

A study of using syntactic cues in short-text similarity measure

A Review on WordNet and Vector Space Analysis for Short-text Semantic Similarity

Integrating a semantic-based retrieval agent into case-based reasoning systems: A case study of an online bookstore

Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space

Using part-of-speech tags as deep-syntax indicators in determining short-text semantic similarity

An empirical study of the textual similarity between source code and source code summaries

Heuristic Inventive Design Problem Solving Based on Semantic Relatedness

Evaluation and classification of syntax usage in determining short-text semantic similarity

A new benchmark dataset with production methodology for short text semantic similarity algorithms

An ontology-based approach for inventive problem solving

SyMSS: A syntax-based measure for short-text semantic similarity

Benchmarking short text semantic similarity

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Short Text Semantic Similarity Research Articles

Related Topics

Articles published on Short Text Semantic Similarity

Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives

A novel hybrid methodology for computing semantic similarity between sentences through various word senses

Research on Semantic Similarity of Short Text Based on Bert and Time Warping Distance

A survey on the techniques, applications, and performance of short text semantic similarity

A Semantic and Syntactic Similarity Measure for Political Tweets

Learning short-text semantic similarity with word embeddings and external knowledge sources

A study of using syntactic cues in short-text similarity measure

A Review on WordNet and Vector Space Analysis for Short-text Semantic Similarity

Integrating a semantic-based retrieval agent into case-based reasoning systems: A case study of an online bookstore

Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space

Using part-of-speech tags as deep-syntax indicators in determining short-text semantic similarity

An empirical study of the textual similarity between source code and source code summaries

Heuristic Inventive Design Problem Solving Based on Semantic Relatedness

Evaluation and classification of syntax usage in determining short-text semantic similarity

A new benchmark dataset with production methodology for short text semantic similarity algorithms

An ontology-based approach for inventive problem solving

SyMSS: A syntax-based measure for short-text semantic similarity

Benchmarking short text semantic similarity