Modeling contextual relationships in images as graph inference is an interesting and promising research topic. However, existing approaches only perform graph modeling of entities, ignoring the intrinsic geometric features of images. To overcome this problem, a novel multiresolution interpretable contourlet graph network (MICGNet) is proposed in this article. MICGNet delicately balances graph representation learning with the multiscale and multidirectional features of images, where contourlet is used to capture the hyperplanar directional singularities of images and multilevel sparse contourlet coefficients are encoded into graph for further graph representation learning. This process provides interpretable theoretical support for optimizing the model structure. Specifically, first, the superpixel-based region graph is constructed. Then, the region graph is applied to code the nonsubsampled contourlet transform (NSCT) coefficients of the image, which are considered as node features. Considering the statistical properties of the NSCT coefficients, we calculate the node similarity, i.e., the adjacency matrix, using Mahalanobis distance. Next, graph convolutional networks (GCNs) are employed to further learn more abstract multilevel NSCT-enhanced graph representations. Finally, the learnable graph assignment matrix is designed to get the geometric association representations, which accomplish the assignment of graph representations to grid feature maps. We conduct comparative experiments on six publicly available datasets, and the experimental analysis shows that MICGNet is significantly more effective and efficient than other algorithms of recent years.
Read full abstract