Event Abstract Back to Event Learning visual motion and structure as latent explanations from huge motion data sets Rudolf Mester1*, Alvaro Guevara1, Christian Conrad1 and Holger Friedrich1 1 Goethe-Universität Frankfurt, Bernstein Focus Neurotechnology (BFNT), Germany The term ‘motion’ denotes a simple explanation for a possibly complicated change of the illumination pattern sensed in an eye or a camera. Motion explains these changes by transformation of the brightness pattern such as e.g. shift, rotation, and scaling. Other important components of these ‘explanations’ are segmentation (= grouping into spatially connected objects) and scene depth. In that sense, ‘motion’ is an, in an information-theoretic sense, ’cheap’ description of the second and further image in a sequence, conditioned on the first image to be given. The question is, based on which principles, purely based on long-term observation, perception is capable to distill concepts such as motion, depth, and segmentation from the continuous stream of visual input data, without having a priori access to mathematical models of the world and to models of the signals observed which a priori employ these ‘explanatory entities’. Optical flow is the representation of motion in the image plane, and ‘technical’ equations such as the brightness constancy constraint equation (BCCE) state an explicit relation between the observable entities (spatio-temporal derivates of the image signal), and the unknown explanatory variable, i.e. motion. However: how can such a relation be learnt, instead of constructed on the basis of an already available ‘higher insight’? We suggest that the emergence of concepts such as motion in a visual perception system is largely supported by (ego)motoric information, and by discovering statistical correlations (not necessarily only linear ones) between motoric data and characteristics of the instantaneous spatio-temporal characteristics of the visual motion field. Both local motion (as it appears in the optical flow equations) as well as global motion (e.g. parametric descriptions of the complete visual motion field) are claimed to be informative ‘latent variables’ that emerge from a statistical analysis of observable sensory information, in particular the spatio-temporal image signal as well as motoric signals. We are currently exploring this hypothesis on the basis of a large-scale experiment where an autonomous robot continuously ‘explores’ an indoor environment (using standard methods), and collects huge multi-channel video data streams that will be subject to an in-depth analysis in the spirit of the approach sketched above. In contrast to [1], we address specifically to learn (=identify) the latent variables, not primarily to learn the distribution of the parameters. This is an approach that is more in the spirit of [2] where also transformations are learnt.
Read full abstract