Abstract

Convolutional Neural Networks (CNN) have achieved great performance in many visual tasks. However, CNN models are sensitive to samples with large spatial variants, especially severe in fine-grained classification task. In this paper, we propose a novel CNN model called ST-BCNN to solve these problems. ST-BCNN contains two functional CNN modules: Spatial Transform Network (STN) and Bilinear CNN(BCNN). Firstly, STN module is used to select key region in input samples and get it spatially modified. Since the adoption of STN will cause an information loss phenomenon called boundary loss, we design a brand-new IOU loss method to solve it. We make a theoretical analysis of the IOU loss method. Secondly, to discover discriminative features for fine-grained classification task, BCNN module is applied. BCNN interacts CNN features from different channels to produce more discriminative bilinear features than fully connected features of CNN. ST-BCNN works by reducing irrelevant spatial states and producing fine-grained features. We evaluate our model on 3 public fine-grained classification datasets with large spatial variants: CUB200-2011, Fish100 and UAV43. Experiments show that the IOU loss method can reduce boundary loss and make STN module output spatial transformed image appropriately. Our proposed ST-BCNN model outperforms other advanced CNN models on all three datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call