Abstract

Event Abstract Back to Event A Probabilistic Generative Approach to Invariant Visual Inference and Learning Zhenwen Dai1* and Jörg Lücke1 1 Goethe-Universität Frankfurt am Main, FIAS, Germany Inference and learning from visual data is a challenging task because of noise and the data's ambiguity. The most advanced vision systems to date are the sensory visual circuitries of higher vertebrates. Although artificial approaches make continuous progress, they are for the majority of applications an unequal match to such biological systems so far. To understand and to rebuild biological systems, they have been modeled using approaches from artificial intelligence, artificial neural networks, and probabilistic models. In terms of how the problem of invariant recognition is approached, these models can coarsely be grouped into two classes: models that passively treat invariances (e.g., [1,2]) and models that actively address the typical transformation invariances of object identities (e.g., [3-6]). The former approaches are often feed-forward while the latter approaches are usually recurrent. In this work we study a probabilistic generative approach that explicitly addresses the translation invariance of objects in visual data. Object location is modeled using an explicit hidden variable while the object itself is encoded by a specific spatial combination of features. The hidden variable for object position corresponds in this respect to neural control units that were suggested as crucial computational elements in biological networks (e.g., [3,5,6]). The investigated generative model autonomously learns from unlabeled data with object identity and position. The data always contain the same object type at a random position and the back-ground of each data point is different. Note that such learning is a challenging task because it is initially difficult to infer the object position without knowledge about the object features, and without known object positions these features are difficult to infer. By using a probabilistic generative approach, we can nevertheless show that an object's feature combination can reliably be learned based on a maximum likelihood approach. We use expectation maximization (EM) to increase the data likelihood, which is similar, e.g., to approaches by Olshausen et. al. [3] or Williams and Titsias [7]. However, in contrast to these earlier studies we use vectorial features for image encoding and can robustly handle changing back-grounds. We demonstrate the algorithm using artificial and more realistic visual data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call