Abstract

Distributed sentence representations have shown great power in a wide range of natural language processing (NLP) tasks. Meanwhile, a contextualized sentence representation, called Sentence BERT (SBERT), achieves excellent performance in quite a few NLP tasks. However, Sentence BERT is typically learned in the Euclidean space, the geometric structure of sentence representations and their relations to the representations of sentence’s contexts has not been carefully studied yet. In this paper, we propose a new sentence representation method, named Refined SBERT, which utilizes manifold learning to refine sentence BERT by re-embedding sentence vectors from the original embedding space to a new refined semantic space. In order to map sentences to manifold space, we utilize neighborhood preserving embedding to construct the local manifold structure of the sentences. Our method can discover the local geometric structure and obtain a compact sentence BERT subspace, which can best detect the essential semantic structure. We conduct comprehensive experiments on various sentence embedding tasks including semantic textual similarity tasks, text classification and document clustering, and the experimental results show that the proposed model achieved promising results, comparing to several popular sentence representations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call