Abstract

In fine-grained recognition, classical high-order coding has inherent contradiction between visual burstiness and feature redundancy, the core of which is the inherent instability of high-order features. Existing methods mainly use EIG and SVD decomposition to maintain feature stability, but this process increases feature redundancy. To address this problem, this paper proposes a Graph Bilinear Pooling (GBP) model to obtain stable fine-grained features through the effective aggregation ability of graph networks. GBP avoids explicit feature decomposition and reconciles the contradiction between visual burstiness and feature redundancy. First, GBP transforms images into a graph spectrum through feature correlation measurement. Then, an improved multi-head graph convolution structure is proposed by using Graph Isomorphism Networks (GIN) to realize feature aggregation. Finally, bilinear pooling operations are performed between graph convolution feature maps and original feature maps to obtain more compact and stable fine-grained feature representations. Experiments on CUB, Cars, and Aircrafts datasets demonstrate that the accuracy of the proposed method is 87.8 %, 93.5 %, and 89.6 % respectively, with a feature representation of 2048 dimensions. Compared to the baseline, the feature number is only 25 % of the baseline model, and the accuracy is increased by 2.6 %, 1.7 %, and 1.3 % respectively. These results demonstrate the effectiveness of graph neural network embedding in improving feature stability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call