Abstract

Capsule networks (see Hinton etal., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this letter, we specify a generative model for such data and derive a variational algorithm for inferring the transformation of each model object in a scene and the assignments of observed parts to the objects. We derive a learning algorithm for the object models, based on variational expectation maximization (Jordan etal., 1999). We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981). We apply these inference methods to data generated from multiple geometric objects like squares and triangles ("constellations") and data from a parts-based model of faces. Recent work by Kosiorek etal. (2019) has used amortized inference via stacked capsule autoencoders to tackle this problem; our results show that we significantly outperform them where we can make comparisons (on the constellations data).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call