Abstract

Text representation is an important topic in the field of natural language processing, which can effectively transfer knowledge to downstream tasks. To extract effective semantic information from text with unsupervised methods, this paper proposes a quantum language-inspired tree structural text representation model to study the correlations between words with variable distance for semantic analysis. Combining the different semantic contributions of associated words in different syntax trees, a syntax tree-based attention mechanism is established to highlight the semantic contributions of non-adjacent associated words and weaken the semantic weight of adjacent non-associated words. Moreover, the tree-based attention mechanism includes not only the overall information of entangled words in the dictionary but also the local grammatical structure of word combinations in different sentences. Experimental results on semantic textual similarity tasks show that the proposed method obtains significant performances over the state-of-the-art sentence embeddings.

Highlights

  • The parallelism of quantum computing has attracted more and more attention in different fields

  • This paper proposes a quantum entangled word representation based on syntax trees to represent sentences

  • Combining the attention mechanism based on the dependency tree with the quantum entanglement coefficient, the entanglement coefficient between words is related to the PoS combination of the two words and the distribution of the two words in the dictionary and to the modified relationship between the words

Read more

Summary

Introduction

The parallelism of quantum computing has attracted more and more attention in different fields. According to the association between relation entities in the dependency parser of the sentence, the text representation based on the dependency parser combines the word vector tensors of two words with relation entities to establish the semantics between long-distance dependent words with relation entities entanglement, so that distant words with a direct modified relationship can be semantically related. According to the different degrees of modified relationship between words in the constituency parser, the text representation based on the constituency parser combines the semantic correlation coefficient with the distribution characteristics of adjacent words to establish the semantic association. (1) A quantum language-inspired text representation model based on relation entity and constituency parser is established, including long-range and short-range semantic associations between words. The new model BERT [30] and its variants [31,32] based on transformer divided the pretraining methods into feature-based methods and fine-tuning methods [33]

Dependency Tree
Quantum Based NLP
Entanglement between Words with Short-Range Modified Relationship
Embedding of Entangled Word Two adjacent words are entangled together in order, forming the arrays
Attention Mechanism
Sentence Similarity
Optimize the Sentence Embedding
Entanglement between Words with Long-Range Modified Relationship
Sentence Embedding Based on Constituency Parser and Relation Entity
Reduce Sentence Embedding Dimensions
Datasets
Experimental Settings
Influence of PoS Combination Weight TI,J
On STS-Benchmark
Summary
Conclusions and Future Works

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.