Probabilistic Modeling of Joint-context in Distributional Similarity

Oren Melamud,Jacob Goldberger,Ido Dagan,Idan Szpektor,Deniz Yuret

doi:10.3115/v1/w14-1619

Abstract

Most traditional distributional similarity models fail to capture syntagmatic patterns that group together multiple word features within the same joint context. In this work we introduce a novel generic distributional similarity scheme under which the power of probabilistic models can be leveraged to effectively model joint contexts. Based on this scheme, we implement a concrete model which utilizes probabilistic n-gram language models. Our evaluations suggest that this model is particularly wellsuited for measuring similarity for verbs, which are known to exhibit richer syntagmatic patterns, while maintaining comparable or better performance with respect to competitive baselines for nouns. Following this, we propose our scheme as a framework for future semantic similarity models leveraging the substantial body of work that exists in probabilistic language modeling.

Highlights

The Distributional Hypothesis is commonly phrased as “words which are similar in meaning occur in similar contexts” (Rubenstein and Goodenough, 1965)
It was suggested that the word feature vector approach misses valuable information, which is embedded in the colocation and inter-relations of words within the same context (Ruiz-Casado et al, 2005)
Our evaluations suggest that our model is advantageous for measuring semantic similarity for verbs, while maintaining comparable or better performance with respect to competitive baselines for nouns

Summary

Introduction

The Distributional Hypothesis is commonly phrased as “words which are similar in meaning occur in similar contexts” (Rubenstein and Goodenough, 1965). It was suggested that the word feature vector approach misses valuable information, which is embedded in the colocation and inter-relations of words (e.g. word order) within the same context (Ruiz-Casado et al, 2005) Following this motivation, Ruiz-Casado et al (2005) proposed an alternative compositefeature model, later adopted in (Agirre et al, 2009). This model adopts a richer context representation by considering entire word window contexts as features, while keeping the same computational vector-based model. A single feature that could be retrieved this way for the target word like is “Children cookies and milk” They showed good results on detecting synonyms in the 80 multiple-choice questions TOEFL test. We are not aware of additional works following this approach, of using entire word windows as features

Objectives

Methods

Results