The image classification precision is vastly enhanced with the growing complexity of convolutional neural network (CNN) structures. However, the uneven visual separability between categories leads to various difficulties in classification. The hierarchical structure of categories can be leveraged to deal with it, but a few CNNs pay attention to the character of data. Besides, a network model with a hierarchical structure is promising to extract more specific features from the data than current CNNs, since, for the latter, all categories have the same fixed number of layers for feed-forward computation. In this paper, we propose to use category hierarchies to integrate ResNet-style modules to form a hierarchical network model in a top-down manner. To extract abundant discriminative features and improve the computation efficiency, we adopt residual block selection based on coarse categories to allocate different computation paths. Each residual block works as a switch to determine the JUMP or JOIN mode for an individual coarse category. Interestingly, since some categories need less feed-forward computation than others by jumping layers, the average inference time cost is reduced. Extensive experiments show that our hierarchical network achieves higher prediction accuracy with similar FLOPs on CIFAR-10 and CIFAR-100, SVHM, and Tiny-ImageNet datasets compared to original residual networks and other existing selection inference methods.
Read full abstract