Knowing who is where is a common task for many computer vision applications. Most of the literature focuses on one of two approaches: determining who a detected person is (appearance-based re-identification) and collating positions into a list, or determining the motion of a person (spatio-temporal-based tracking) and assigning identity labels based on tracks formed. This paper presents a model fusion approach, aiming towards combining both sources of information together in order to increase the accuracy of determining identity classes for detected people using re-ranking. First, a Sequential k-Means re-identification approach is presented, followed by a Kalman filter-based spatio-temporal tracking approach. A linear weighting approach is used to fuse the outputs from these models together, with modification of the weights using a decay function and a rule-based system to reflect the strengths and weaknesses of the models under different conditions. Preliminary experimental results with two different person detection algorithms on an indoor person tracking dataset show that fusing the appearance and spatio-temporal models significantly increases the overall accuracy of the classification operation.
Read full abstract