In this paper, we leverage multimodal data to classify minerals using a multi-stream neural network. In a previous study on the Tinto dataset, which consisted of a 3D hyperspectral point cloud from the open-pit mine Corta Atalaya in Spain, we successfully identified mineral classes by employing various deep learning models. However, this prior work solely relied on hyperspectral data as input for the deep learning models. In this study, we aim to enhance accuracy by incorporating multimodal data, which includes hyperspectral images, RGB images, and a 3D point cloud. To achieve this, we have adopted a graph-based neural network, known for its efficiency in aggregating local information, based on our past observations where it consistently performed well across different hyperspectral sensors. Subsequently, we constructed a multi-stream neural network tailored to handle multimodality. Additionally, we employed a channel attention module on the hyperspectral stream to fully exploit the spectral information within the hyperspectral data. Through the integration of multimodal data and a multi-stream neural network, we achieved a notable improvement in mineral classification accuracy: 19.2%, 4.4%, and 5.6% on the LWIR, SWIR, and VNIR datasets, respectively.
Read full abstract