Indoor scenes are crucial components of urban spaces, with logos serving as vital information within these environments. The accurate perception of logos is essential for effectively operating mobile robots in indoor environments, which significantly contributes to many upper-level applications. With the rapid development of neural networks, numerous deep-learning-based object-detection methods have been applied to logo detection. However, most of these methods depend on large labeled datasets. Given the fast-changing nature of logos in indoor scenes, achieving reliable detection performance with either the existing large labeled datasets or a limited number of labeled logos remains challenging. In this article, we propose a method named MobileNetV2-YOLOv4-UP, which integrates unsupervised learning with few-shot learning for logo detection. We develop an autoencoder to obtain latent feature representations of logos by pre-training on a public unlabeled logo dataset. Subsequently, we construct a lightweight logo-detection network and embed the encoder weights as prior information. Training is performed on a small dataset of labeled indoor-scene logos to update the weights of the logo-detection network. Experimental results on the public logo625 dataset and our self-collected LOGO2000 dataset demonstrate that our method outperforms classic object-detection methods, achieving a mean average detection precision of 83.8%. Notably, our unsupervised pre-training strategy (UP) has proven effective, delivering a 15.4% improvement.
Read full abstract