Bird identification is the first step in collecting data on bird diversity and abundance, which also helps research on bird distribution and population measurements. Most research has built end-to-end training models for bird detection task via CNNs or attentive models, but many perform unsatisfactorily in fine-grained bird recognition. Bird recognition tasks are highly influenced by factors, including the similar appearance of different subcategories, diverse bird postures, and other interference factors such as tree branches and leaves from the background. To tackle this challenge, we propose the Progressive Cross-Union Network (PC-Net) to capture more subtle parts with low-level attention maps. Based on cross-layer information exchange and pairwise learning, the proposed method uses two modules to improve feature representation and localization. First, it utilizes low- and high-level information for cross-layer feature fusion, which enables the network to extract more comprehensive and discriminative features. Second, the network incorporates deep semantic localization to identify and enhance the most relevant regions in the images. In addition, the network is designed with a semantic guidance loss to improve its generalization for variable bird poses. The PC-Net was evaluated on an extensively used birds dataset (CUB-200-2011), which contains 200 birds subcategories. The results demonstrate that the PC-Net achieved an impressive recognition accuracy of 89.2%, thereby outperforming maintained methods in bird subcategory identification. We also achieved competitive results on two other datasets with data on cars and airplanes. The results indicated that the PC-Net improves the accuracy of diverse bird recognition, as well as other fine-grained recognition scenarios.
Read full abstract