HDNet: Human-like discrimination with visual key for few-shot cross-domain object detection

Maozhen Liu,Xiaoguang Di,Wenzhuang Wang

doi:10.1016/j.knosys.2024.111772

Abstract

Most existing few-shot object detection (FSD) methods implicitly assume that the target domain data with few samples conform to the same statistical distribution as the source domain. However, this assumption is impractical, especially when dealing with unconstrained scenarios. Also, the decline of fine-grained hidden samples caused by scene switching, significant morphological changes, etc. has brought great challenges to object detection. To solve the above few-shot cross-domain detection (FS-CDD) problems, in this work, we propose a novel and flexible Human-like Discrimination Network (HDNet), which is composed of four modules. Firstly, multi-level key generation (MKG) fully mining multi-level comprehensive abstract representation. The precious knowledge that contains diverse low and high levels is obtained by three parallel branches. After decoupling the hidden space, the rich patches from each pair of target and source domains are closely matched in the Embedded Space Implicit Association (ESIA) using degree description. In order to enhance the discriminative capability of the model for a few samples, the instances are additionally encoded and corrected at the classification end using the Prediction head with instance embedding(PHIE). Finally, the Adaptive Reweighted Module (ARM) is redesigned to determine coefficient of multiple loss functions, which avoids the improper operation of setting coefficients according to experience in the past. Extensive experiments demonstrate that despite the dual challenges of limited samples and cross-domain scenarios, the proposed HDNet exhibits remarkable performance.

Full Text