Multiple Image Objects Detection, Tracking, and Classification using Human Articulated Visual Perception Capability

HeungKyu Lee

doi:10.5772/6040

Abstract

This chapter examines the multiple image objects detection, tracking, and classification method using human articulated visual perception capability in consecutive image sequences. The described artificial vision system mimics the characteristics of the human visual perception. It is a well known fact that a human being, first detects and focuses motion energy of a scene, and then analyzes only a detailed color region of that focused region using a storage cell from a human brain. From this fact, the spatio-temporal mechanism is derived in order to detect and track multiple objects in consecutive image sequences. This mechanism provides an efficient method for more complex analysis using data association in spatially attentive window and predicted temporal location. In addition, occlusion problem between multiple moving objects is considered. When multiple objects are moving or occluded between them in areas of visual field, a simultaneous detection and tracking of multiple objects tend to fail. This is due to the fact that incompletely estimated feature vectors such as location, color, velocity, and acceleration of a target provide ambiguous and missing information. In addition, partial information cannot render the complete information unless temporal consistency is considered when objects are occluded between them or they are hidden in obstacles. To cope with these issues, the spatially and temporally considered mechanism using occlusion activity detection and object association with partial probability model can be considered. Furthermore, the detected moving targets can be tracked simultaneously and reliably using the extended joint probabilistic data association (JPDA) filter. Finally, target classification is performed using the decision fusion method of shape and motion information based on Bayesian framework. For reliable and stable classification of targets, multiple invariant feature vectors to more certainly discriminate between targets are required. To do this, shape and motion information are extracted using Fourier descriptor, gradients, and motion feature variation on spatial and temporal images, and then local decisions are performed respectively. Finally, global decision is done using decision fusion method based on Bayesian framework. The experimental evaluations show the performance and usefulness of introduced algorithms that are applied to real image sequences. Figure 1 shows the system block-diagram of multi-target detection, tracking, and classification.

Full Text