Abstract

The problem of modelling the semantics of visual events without segmentation or computation of object-centred trajectories is addressed. Two examples are presented. The first illustrates the detection of autonomous visual events without segmentation. The second shows how high-level semantics can be extracted without spatio-temporal tracking or modelling of object trajectories. We wish to infer the semantics of human behavioural patterns for autonomous visual event recognition in dynamic scenes. This is achieved by learning to model the temporal structures of pixel-wise change energy histories using CONDENSATION. The performance of a pixel-energy-history based event model is compared to that of an adaptive Gaussian mixture based scene model. Given low-level autonomous visual events, grouping and high-level reasoning are required to both infer associations between these events and give meaning to their associations. We present an approach for modelling the semantics of interactive human behaviours for the association of a moving head and two hands under self-occlusion and intersection from a single camera view. For associating and tracking the movements of multiple intersecting body parts, we compare the effectiveness of spatio-temporal dynamics based prediction to that of reasoning about body-part associations based on modelling semantics using Bayesian belief networks. q 2002 Elsevier Science B.V. All rights reserved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call