Abstract

This paper presents an end-to-end solution, the Spectral Filter Array (SFA)-guided Mosaic Transformer (SMT), designed for tracking small objects within mosaic spectral videos captured by snapshot spectral cameras. Tracking small objects amidst complex scenes poses greater challenges due to their variable appearances and limited feature representation. Spectral imaging, leveraging spectral and spatial information to characterize the material properties of objects, offers enhanced object feature discrimination compared to conventional visual imaging, making it an ideal choice for this task. Existing spectral tracking techniques, however, fall short in delivering satisfactory results for small objects due to their disruption of spatial-spectral aliasing correlations or disregard for small object characteristics. Hence, the proposed SMT leverages SFA guidance to model and aggregate multi-layer features effectively. Comprising the SFA-guided mosaic backbone (SMB), Multi-layer Feature Aggregation (MFA), and Prediction Head, SMT extracts hierarchical features directly from mosaic spectral images, amalgamates interdependencies between shallow-layer detail and deep-layer semantic information, and precisely locates small objects. Experiment results on our curated fully-annotated mosaic spectral small object tracking dataset, alongside a public normal-sized object tracking dataset, showcase SMT’s prowess in adeptly tracking small objects amidst challenging scenarios such as occlusion and drift. Specifically, our SMT achieves gains ranging from 0.3% to 5.3% in average precision rate and from 2.0% to 5.1% in average success rate over the second-ranked trackers across various challenging attributes. The dataset and code are available at: https://github.com/Chenlulu1993/SMT.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call