Abstract

In public security systems, visual instance retrieval has an explosive growing requirement, especially for large-scale image or video databases. Due to its wide range of applications in surveillance scenario, this paper aims at the retrieval tasks centered around ‘vehicle’ and ‘pedestrian’ targets. Many previous CNN-based methods have not exploited the ensemble abilities of different models, which achieve limited accuracy since a certain kind of deep architecture is not comprehensive. On the other hand, some features in the original deep representation are useless for retrieval tasks, while the attention-aware compact representation will be much more efficient and effective. To address the above problems, we propose a Selective Deep Ensemble (SDE) framework to combine various models and features in a complementary way, inspired by the attention mechanism. It is demonstrated that a large improvement can be acquired with slight increase on computation cost. Finally, we evaluate the performance on three public instance-retrieval datasets, VehicleID, VeRi and Market-1501, outperforming state-of-the-art methods by a large margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call