Abstract

Semantic Textual Similarity (STS) evaluation assesses the degree to which two parts of texts are similar, based on their semantic evaluation. In this paper, we describe three models submitted to STS SemEval 2017. Given two English parts of a text, each of proposed methods outputs the assessment of their semantic similarity. We propose an approach for computing monolingual semantic textual similarity based on an ensemble of three distinct methods. Our model consists of recursive neural network (RNN) text auto-encoders ensemble with supervised a model of vectorized sentences using reduced part of speech (PoS) weighted word embeddings as well as unsupervised a method based on word coverage (TakeLab). Additionally, we enrich our model with additional features that allow disambiguation of ensemble methods based on their efficiency. We have used Multi-Layer Perceptron as an ensemble classifier basing on estimations of trained Gradient Boosting Regressors. Results of our research proves that using such ensemble leads to a higher accuracy due to a fact that each member-algorithm tends to specialize in particular type of sentences. Simple model based on PoS weighted Word2Vec word embeddings seem to improve performance of more complex RNN based auto-encoders in the ensemble. In the monolingual English-English STS subtask our Ensemble based model achieved mean Pearson correlation of .785 compared with human annotators.

Highlights

  • The objective of a system for evaluating semantic textual similarity, is to produce a value which serves as a rating of semantic similarity between pair of text samples

  • The purpose of the Semantic Textual Similarity (STS) task is to assess the semantic similarity of two sentences

  • We have presented the OPI-JSA system submitted by our team for SemEval 2017, Task 1, Subtask 5

Read more

Summary

Introduction

The objective of a system for evaluating semantic textual similarity, is to produce a value which serves as a rating of semantic similarity between pair of text samples. Such task certainly could not be regarded as toy problem, the results could be used to solve multiple real-world problems, e.g. plagiarism detection. We used described methods in STS task in the SemEval 2017 competition (Bethard et al, 2017)

Methods
Models
TakeLab
Run 2: Skip Thoughts Vectors
Run 3: Ensemble
Results
Method
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call