ABSTRACTRemote sensing scene classification is the basis of advanced smart urban planning tasks such as urban functional zone division and land use type identification. In recent years, a wide range of emerging data sources is receiving progressive attention for urban features extraction, such as satellites, unmanned aerial vehicles (UAVs), and ground sensors. How to effectively utilize these multi-view data jointly to improve scene classification performance has become a hot topic of remote sensing challenge. Existing feature fusion methods tend to map data from different views into a common feature space, which is often difficult to find when the data between views differ greatly. Furthermore, because these methods require data from all views as input, they are not flexible enough to handle situations where there is only one view input when inferring. To address the aforementioned issues, a novel Coupled Parallel Architecture (CPA) using Weighted Collaboration Fusion Constrained by Consistency Between Views (CBV-WCF) is proposed in this paper. In the training phase, the CBV module reduces the impact of the heterogeneous gap across views by capturing the consistency information between views. Well the WCF module is used to fully mine and effectively fuse the complementary information between views to improve the performance of downstream tasks. In the inference phase, the proposed architecture can effectively improve the classification performance in both cases of multi-view and single-view input. Our method is evaluated on air-ground dual-view scene classification, which is a typical multi-view task with large image differences between views. Experimental results on two publicly available air-ground dual-view datasets demonstrate that the proposed framework significantly improves classification performance while bringing some inspiration and new solutions to multi-view tasks. The code of this paper will be published at: https://github.com/Forest-repo/CBV-WCF.
Read full abstract