Abstract

Mineral recognition plays a pivotal role in advancing geological survey methodologies and exploration techniques, serving as a cornerstone of contemporary geoscience research. Recently, Transformer-based neural networks have outperformed ConvNets and have become increasingly prominent in vision models. However, adapting Transformer models to mineral photograph recognition presents two significant challenges. Firstly, mineral photograph recognition heavily relies on low-level features such as color, texture, and edges, which Transformers are not intrinsically optimized for. Secondly, the accurate recognition of small-scale objects within mineral images often poses difficulties. To tackle these challenges, we introduce the SwinMin model, specifically designed for mineral photograph recognition. This model incorporates convolutional information into Transformer sequences, thereby enriching the global representation with finer details. Furthermore, we propose a dynamic feature fusion module, which effectively exploits multi-scale contexts, ensuring a more comprehensive representation. Extensive experiments on the mineral photograph datasets demonstrated that SwinMin achieves state-of-the-art performance compared to existing mineral image recognition methods, underlining its potential for reliable and precise mineral image identification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call