Visual tracking consists of locating or determining the configuration of a known object at each frame of a video sequence. Usually, the description of the whole scene involves the participation of multiple targets, their movements and interactions, etc., and the scenario particular features. This paper presents a visual tracking system framework oriented to provide a “near natural language” description of the involved targets in the scene actions. Our prototype focuses on the detection, tracking and feature extraction of a dynamic number of targets in a scenario along time. The design of any visual tracking system usually needs the injection of human knowledge at each transformed level of description, in order to produce from raw videos a linguistic scene summary. The main aim of this work was to make explicit the knowledge injection needed to link the low-level representations (associated to signals) to the high-level semantics (related to knowledge) in the visual tracking problem. As a result, the emerging semantic necessary at the two transformation level is analysed and presented. We have concentrated on the representation spaces for the memetic algorithm particle filter applied to multiple object tracking in annotated scenarios, oriented to video-based surveillance applications. Finally, some example applications on different surveillance scenarios are presented and discussed.
Read full abstract