Abstract

AbstractLabeling fine‐grained objects manually is extremely challenging, as it is not only label‐intensive but also requires professional knowledge. Accordingly, robust learning methods for fine‐grained recognition with web images collected from Internet of Things have drawn significant attention. However, training deep fine‐grained models directly using untrusted web images is confronted by two primary obstacles: (1) label noise in web images and (2) domain variance between the online sources and test datasets. To this end, in this study, we mainly focus on addressing these two pivotal problems associated with untrusted web images. To be specific, we introduce an end‐to‐end network that collaboratively addresses these concerns in the process of separating trusted data from untrusted web images. To validate the efficacy of our proposed model, untrusted web images are first collected by utilizing the text category labels found within fine‐grained datasets. Subsequently, we employ the designed deep model to eliminate label noise and ameliorate domain mismatch. And the chosen trusted web data are utilized for model training. Comprehensive experiments and ablation studies validate that our method consistently surpasses other state‐of‐the‐art approaches for fine‐grained recognition tasks in real‐world scenarios, demonstrating a significant improvement margin (2.51% on CUB200‐2011 and 2.92% on Stanford Dogs). The source code and models can be accessed at: https://github.com/Codeczh/FGVC‐IoT.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call