Video target tracking is the process of estimating the current state, and predicting the future state of a target from a sequence of video sensor measurements. Multitarget video tracking is complicated by the fact that targets can occlude one another, affecting video feature measurements in a highly non-linear and difficult to model fashion. In this paper, we apply a multisensory fusion approach to the problem of multitarget video tracking with occlusion. The approach is based on a data-driven method (CFA) to selecting the features and fusion operations that improve a performance criterion. Each sensory cue is treated as a scoring system. Scoring behavior is characterized by a rank–score function. A diversity measure, based on the variation in rank–score functions, is used to dynamically select the scoring systems and fusion operations that produce the best tracking performance. The relationship between the diversity measure and the tracking accuracy of two fusion operations, a linear score combination and an average rank combination, is evaluated on a set of 12 video sequences. These results demonstrate that using the rank–score characteristic as a diversity measure is an effective method to dynamically select scoring systems and fusion operations that improve the performance of multitarget video tracking with occlusions.
Read full abstract