Abstract

Few-shot object detection (FSOD) aims at learning a novel class object detector with abundant base class samples and a limited number of novel class samples. Some recent methods assume that base class images contain unlabeled novel class instances and mine these instances to address the data scarcity of novel classes. Despite achieving competitive results, these methods face two issues: First, mined instances are sampled around the provided few-shot instances, establishing biased data distributions to represent novel classes. Second, the rich novel class semantics contained in base class instances are largely ignored. Benefits from vision-language models (VLMs), we propose a VLM-guided Explicit-Implicit Complementary novel class semantic learning (VEIC) method for FSOD. VEIC incorporates a VLM-cooperated sample mining module that leverages the powerful semantic alignment capability of VLMs to mine representative unlabeled explicit novel instances. In addition, for implicit novel instances (ambiguous base instances contain novel class semantics), we employ a spatial-wise feature re-expression module to capture prominent novel class local patches. Then a dynamic class-transition label assign strategy is proposed to learn from implicit novel instances by utilizing deep network memorization, where implicit novel instances are regarded more as novel classes during the early training and later regarded more as base classes. Experimentally, learning novel class semantics from two complementary types of instances reaches the best performance. Furthermore, learning exclusively from implicit novel instances extends the pseudo-labeling method to a practical scenario where base class images do not contain unlabeled explicit novel instances.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call