Abstract

Cross-camera image data association is essential for many multi-camera computer vision tasks, such as multi-camera pedestrian detection, multi-camera multi-target tracking, 3D pose estimation, etc. This association task is typically modeled as a bipartite graph matching problem and often solved by applying minimum-cost flow techniques, which may be computationally demanding for large data. Furthermore, cameras are usually treated by pairs, obtaining local solutions, rather than finding a global solution at once for all multiple cameras. Other key issue is that of the affinity function: the widespread usage of non-learnable pre-defined distances, such as the Euclidean and Cosine ones. This paper proposes an effective approach for cross-camera data-association focused on a global solution, instead of processing cameras by pairs. To avoid the usage of fixed distances and thresholds, we leverage the connectivity of Graph Neural Networks, previously unused in this scope, using a Message Passing Network to jointly learn features and similarity functions. We validate the proposal for pedestrian cross-camera association, showing results over the EPFL multi-camera pedestrian dataset. Our approach considerably outperforms the literature data association techniques, without requiring to be trained in the same scenario in which it is tested. Our code is available at <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><uri>https://www-vpu.eps.uam.es/publications/gnn</uri></monospace>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call