Abstract

The human visual system can recognize object categories accurately and efficiently and is robust to complex textures and noises. To mimic the analogy-detail dual-pathway human visual cognitive mechanism revealed in recent cognitive science studies, in this article, we propose a novel convolutional neural network (CNN) architecture named analogy-detail networks (ADNets) for accurate object recognition. ADNets disentangle the visual information and process them separately using two pathways: the analogy pathway extracts coarse and global features representing the gist (i.e., shape and topology) of the object, while the detail pathway extracts fine and local features representing the details (i.e., texture and edges) for determining object categories. We modularize the architecture and encapsulate the two pathways into the analogy-detail block as the CNN building block to construct ADNets. For implementation, we propose a general principle that transmutes typical CNN structures into the ADNet architecture and applies the transmutation on representative baseline CNNs. Extensive experiments on CIFAR10, CIFAR100, street view house numbers, and ImageNet data sets demonstrate that ADNets significantly reduce the test error rates of the baseline CNNs by up to 5.76% and outperform other state-of-the-art architectures. Comprehensive analysis and visualizations further demonstrate that ADNets are interpretable and have a better shape-texture tradeoff for recognizing the objects with complex textures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call