We address the problem of perceptual grouping from motion cues by formulating it as a motion layers inference from a sparse and noisy point set in a 4D space. Our approach is based on a layered 4D representation of data, and a voting scheme for token communication, within a tensor voting computational framework. Given two sparse sets of point tokens, the image position and potential velocity of each token are encoded into a 4D tensor. By enforcing the smoothness of motion through a voting process, the correct velocity is selected for each input point as the most salient token. An additional dense voting step allows for the inference of a dense representation in terms of pixel velocities, motion regions, and boundaries. Using a 4D space for this tensor voting approach is essential since it allows for a spatial separation of the points according to both their velocities and image coordinates. Unlike most other methods that optimize certain objective functions, our approach is noniterative and, therefore, does not suffer from local optima or poor convergence problems. We demonstrate our method with synthetic and real images, by analyzing several difficult cases-opaque and transparent motion, rigid and nonrigid motion, curves and surfaces in motion.
Read full abstract