Abstract
In the face of global concerns about endangered ecosystems, it is vital to identify individual animals. Along these lines, in this work, a Vision Transformer (ViT) based model for sika deer individual recognition using facial data was designed. To get the satisfactory results, both low-level aspects like texture and color must also be considered, in addition to the high-level semantic information. Consequently, it was difficult to get good results by only applying advanced retrieval features. The standard ViT or ViT with ResNet (Residual neural network) as the backbone network may not be the best solution, as the direct patch flattening method of feature embedded in the conventional ViT is not applicable for performing deer face recognition. Therefore, DenseNet (Densely connected convolutional networks) block as Module 1 was used for extracting low-level features. DenseNet layers enable feature reuse through dense connections, and any layer can communicate directly. Thus maximum exchange of information flow between layers in the network is enabled. In Module 2, the mask approach was also used to eliminate extraneous information from the images and reduce interference from complicated backgrounds on the identification accuracy. In addition, the pixel multiplication of the feature map output from the two modules enabled the fusion of the local features with global features, enriching hence the expressiveness of the feature map. Finally, the ViT structure was run through pre-trained. The experimental results showed that the proposed model can reach an accuracy of 97.68% for identifying sika deer individuals and exhibited excellent generalization capabilities. A valid database for the individual identification of sika deer is provided by our work, significantly contributing to the conservation and promotion of the ecosystem.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have