FusAtNet: Dual Attention based SpectroSpatial Multimodal Fusion Network for Hyperspectral and LiDAR Classification

Satyam Mohla,Shivam Pande,Subhasis Chaudhuri,Biplab Banerjee

doi:10.1109/cvprw50498.2020.00054

Abstract

With recent advances in sensing, multimodal data is becoming easily available for various applications, especially in remote sensing (RS), where many data types like multispectral imagery (MSI), hyperspectral imagery (HSI), LiDAR etc. are available. Effective fusion of these multisource datasets is becoming important, for these multimodality features have been shown to generate highly accurate land-cover maps. However, fusion in the context of RS is non-trivial considering the redundancy involved in the data and the large domain differences among multiple modalities. In addition, the feature extraction modules for different modalities hardly interact among themselves, which further limits their semantic relatedness. As a remedy, we propose a feature fusion and extraction framework, namely FusAtNet, for collective land-cover classification of HSIs and LiDAR data in this paper. The proposed framework effectively utilizses HSI modality to generate an attention map using "self-attention" mechanism that highlights its own spectral features. Similarly, a "cross-attention" approach is simultaneously used to harness the LiDAR derived attention map that accentuates the spatial features of HSI. These attentive spectral and spatial representations are then explored further along with the original data to obtain modality-specific feature embeddings. The modality oriented joint spectro-spatial information thus obtained, is subsequently utilized to carry out the land-cover classification task. Experimental evaluations on three HSILiDAR datasets show that the proposed method achieves the state-of-the-art classification performance, including on the largest HSI-LiDAR dataset available, University of Houston (Data Fusion Contest - 2013), opening new avenues in multimodal feature fusion for classification.

Highlights

With the advent of advanced sensing technologies, simultaneous acquisition of multimodal data for the same underlying phenomenon is possible nowadays
To the best of our knowledge, ours is one of the first approaches to introduce the notion of attention learning for hyperspectral imagery (HSI)-light detection and ranging (LiDAR) fusion in the context of land-cover classification. In this regard, we introduce the concept of “crossattention” based feature learning among the modalities, a novel and intuitive fusion method which utilises attention from one modality to highlight features in the other modality (HSI)
It is clearly visible for all the cases that our method outperforms all the state of the art methods with a significant margin in all the avenues, be it overall accuracy (OA), AA or κ

Summary

Introduction

With the advent of advanced sensing technologies, simultaneous acquisition of multimodal data for the same underlying phenomenon is possible nowadays. This is especially important in remote sensing (RS), owing to presence of satellite image data from several sources like multispectral (MSI), hyperspectral (HSI), synthetic aperture radar (SAR), panchromatic (PCI) sensors etc. The detailed spectral information from HSI is commonly used to discriminate various materials based on their reflectance values, finding applications in agricultural monitoring, environment-pollution monitoring, urban-growth analysis, land-use pattern [1, 2]. LiDAR data is used to obtain the elevation information, which is useful to distinguish objects within the same material [3]. Since the attributes of these modalities complement each other, they are extensively used in a cumulative fashion for multimodal learning in remote sensing domain [4, 5]

Objectives

Methods

Results

Conclusion