Structured Learning for Multiple Object Tracking

Wang Yan,Xiaoye Han,Vladimir Pavlović

doi:10.5244/c.26.48

Abstract

Adaptive tracking-by-detection methods use previous tracking results to generate a new training set for object appearance, and update the current model to predict the object location in subsequent frames. Such approaches are typically bootstarpped by manual or semi-automatic initialization in the first several frames. However, most adaptive tracking-bydetection methods focus on tracking of a single object or multiple unrelated objects. Although one can trivially engage several single object trackers to track multiple objects, such solution is frequently suboptimal because it does not utilize the inter-object constraints or the obejct layout information [2]. We propose in this paper an adaptive tracking-by-detection method for multiple objects, inspired by recent work in [1] and [2]. The constraints for structured Support Vector Machine (SVM) in [1] are modified to localize multiple objects simultaneously with both appearance and layout information. Moreover, additional binary constraints are introduced to detect the existences of respective objects and to prevent possible model drift. Thus the method can handle frequent occlusion in multiple object tracking, as well as objects entering or leaving the scene. Those binary constraints make the optimization problem significantly different from the original Structured SVM [3]. The inter-object constraints, embedded in a linear programming technique similar to [2] for optimal position assignment, are applied to diminish false detections. In single object tracking case, given a set of frames {x1,x2, . . . ,xn} indexed by time, and the corresponding set of labeling, i.e. bounding box, {y1,y2, . . . ,yn}, structured SVM tries to find a model f (x,y), such that the task of predicting object location in a testing frame x could be conquered by maximizing: f (x,y) = 〈w,Ψ(x|y)〉, (1)

Full Text