Abstract

Deep convolutional neural networks (CNNs) are the dominant technology in computer vision today. Much of the recent computer vision literature can be thought of as a competition to find the best architecture for vision within the deep convolutional framework. Despite all the effort invested in developing sophisticated convolutional architectures, however, it’s not clear how different from each other the best CNNs really are. This paper measures the similarity between two well-known CNNs, Inception and ResNet, in terms of the properties they extract from images. We find that the properties extracted by Inception are very similar to the properties extracted by ResNet, in the sense that either feature set can be well approximated by an affine transformation of the other. In particular, we find evidence that the information extracted from images by ResNet is also extracted by Inception, and in some cases may be more robustly extracted by Inception. In the other direction, most but not all of the information extracted by Inception is also extracted by ResNet.The similarity between Inception and ResNet features is surprising. Convolutional neural networks learn complex non-linear features of images, and the architectural differences between systems suggest that these non-linear functions should take different forms. Nonetheless, Inception and ResNet were trained on the same data set and seem to have learned to extract similar properties from images. In essence, their training algorithms hill-climb in totally different spaces, but find similar solutions. This suggests that for CNNs, the selection of the training set may be more important than the selection of the convolutional architecture. keyword: ResNet, Inception, CNN, Feature Evaluation, Feature Mapping.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call