Abstract

Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE), are a natural way to model transitive relational data (e.g. entailment graphs). However, OE learns a deterministic knowledge base, limiting expressiveness of queries and the ability to use uncertainty for both prediction and learning (e.g. learning from expectations). Probabilistic extensions of OE have provided the ability to somewhat calibrate these denotational probabilities while retaining the consistency and inductive bias of ordered models, but lack the ability to model the negative correlations found in real-world knowledge. In this work we show that a broad class of models that assign probability measures to OE can never capture negative correlation, which motivates our construction of a novel box lattice and accompanying probability measure to capture anti-correlation and even disjoint concepts, while still providing the benefits of probabilistic modeling, such as the ability to perform rich joint and conditional queries over arbitrary sets of concepts, and both learning from and predicting calibrated uncertainty. We show improvements over previous approaches in modeling the Flickr and WordNet entailment graphs, and investigate the power of the model.

Highlights

  • Structured embeddings based on regions, densities, and orderings have gained popularity in recent years for their inductive bias towards the essential asymmetries inherent in problems such as image captioning (Vendrov et al, 2016), lexical and textual entailment (Erk, 2009; Vilnis and McCallum, 2015; Lai and Hockenmaier, 2017; Athiwaratkun and Wilson, 2018), and knowledge graph completion and reasoning (He et al, 2015; Nickel and Kiela, 2017; Li et al, 2017)

  • While the structured prediction analogy applies best to Order Embeddings (OE), which embeds consistent partial orders, other region- and density-based representations have been proposed for the express purpose of inducing a bias towards asymmetric relationships

  • We achieve a new state of the art in denotational probability modeling on the Flickr entailment dataset (Lai and Hockenmaier, 2017), and a matching state-of-the-art on WordNet hypernymy (Vendrov et al, 2016; Miller, 1995) with the concurrent work on thresholded Gaussian embedding of Athiwaratkun and Wilson (2018), achieving our best results by training on additional co-occurrence expectations aggregated from leaf types

Read more

Summary

Introduction

Structured embeddings based on regions, densities, and orderings have gained popularity in recent years for their inductive bias towards the essential asymmetries inherent in problems such as image captioning (Vendrov et al, 2016), lexical and textual entailment (Erk, 2009; Vilnis and McCallum, 2015; Lai and Hockenmaier, 2017; Athiwaratkun and Wilson, 2018), and knowledge graph completion and reasoning (He et al, 2015; Nickel and Kiela, 2017; Li et al, 2017). Models that encode asymmetry, and related properties such as transitivity (the two components of commonplace relations such as partially ordered sets and lattices), have great utility in these applications, leaving less to be learned from the data than arbitrary relational models At their best, they resemble a hybrid between embedding models and structured prediction. Probabilistic models are especially compelling for modeling ontologies, entailment graphs, and knowledge graphs Their desirable properties include an ability to remain consistent in the presence of noisy data, suitability towards semisupervised training using the expectations and uncertain labels present in these large-scale applications, the naturality of representing the inherent uncertainty of knowledge they store, and the ability to answer complex queries involving more than 2 variables. We find that the strong empirical performance of probabilistic ordering models, and our box lattice model in particular, and their endowment of new forms of training and querying, make them a promising avenue for future research in representing structured knowledge

Related Work
Background
Probabilistic Asymmetric Transitive
Correlations from Cone Measures
Box Lattices
Limitations
Learning
Warmup
WordNet
Findings
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call