Semantic Representation of Sentences Employing an Automated Threshold

Md Atabuzzaman,Md Shajalal,Masaki Aono

doi:10.1109/icievicivpr52578.2021.9564148

Abstract

Semantic representations of sentences have become a challenging task for many fields of natural language processing. To represent the sentence, the semantic distributional representations of its words are being used. In this approach, all the features of the words are used to represent the sentence. The features of the words play a major role for all tasks including semantic representation of the sentence for the purpose of measuring semantic textual similarity (STS). In this paper, we study that all the features of the words are not necessary for semantic representation of the sentence to estimate the STS. Pre-trained word-embedding and BERT (Bidirectional Encoder Representation from Transformers) models are employed to get the semantic distribution of the words. An automated threshold is applied to the feature vectors of the words to get the sentence representation. Finally, the cosine similarity is employed on the sentence vectors to measure their STS. To validate the performance of our proposed approach, a wide range of experiments are carried out on four benchmark STS datasets. The results of the experiments concluded that our proposed approach is capable of semantic representation of the sentence and also outperforms some state-of-the-art methods.

Full Text