Fine-grained visual categorization is challenging because the subordinate categories within an entry-level category can only be distinguished by subtle discriminations. This necessitates to localize key (most discriminative) regions and extract domain-specific features alternately, since implicit to fine-grained specialization is the existence of an entry-category visual shared among all classes. Existing methods predominantly implement fine-grained categorization independently, while neglecting that patch proposal and discrimination extraction are mutually correlated and can reinforce each other in an increasingly specialized manner. In this work, we concretize the above pipeline as an Increasing Specialized Generative Adversarial Network (IS-GAN), which recursively shapes a coarse-to-fine representation. It is a three-scale framework consisting of two highlights: a three-player expert GAN at each scale for feature extraction, and a Patch Proposal Network (PPN) between two adjacent scales for target positioning. To better anatomize pixel-to-pixel correlations at various octaves, the Gaussian pyramid and Laplacian pyramid descriptions are also integrated in each GAN. The PPN zooms the areas to shift the focus on the most representative regions by taking previous prediction of classifier as a reference, whilst a finer scale network receives an amplified attended region from previous scale. Overall, IS-GAN is driven by three focal losses from GANs and a converged object-level loss. Experiments demonstrate that IS-GAN can simultaneously (1) deliver competitive categorization performance among state-of-the-arts, i.e., validation accuracy achieves 92.23% and testing accuracy achieves 90.27%, and (2) recover fine-grained textures with high Peak Signal-to-Noise Ratios (PSNRs) (32.937) and Structural Similarities (SSIMs) (0.8607) from hand-crafted and public benchmarks.
Read full abstract