Fine-Grained Image Classification on Agricultural Pest Larvae
Pest management is an essential part of the growth of crops. Accurately identifying the types of pests in the early stage is conducive to formulating targeted prevention and control measures to reduce pests’ impact on grain production. In order to identify the pests in the larval stage as early as possible, in this paper, we compare the conventional classification model and the fine-grained classification model and construct a fine-grained image classification model that can be used to classify the larvae of crop pests, which improves the ability to identify pests in the larval stage. Experiments show that our optimized fine-grained classification model surpasses the general convolution image classification model on the fine-grained agricultural pest dataset AgrFIP20.
- Research Article
81
- 10.3389/fpls.2020.600854
- Dec 22, 2020
- Frontiers in Plant Science
Fine-grained image classification is a challenging task because of the difficulty in identifying discriminant features, it is not easy to find the subtle features that fully represent the object. In the fine-grained classification of crop disease, visual disturbances such as light, fog, overlap, and jitter are frequently encountered. To explore the influence of the features of crop leaf images on the classification results, a classification model should focus on the more discriminative regions of the image while improving the classification accuracy of the model in complex scenes. This paper proposes a novel attention mechanism that effectively utilizes the informative regions of an image, and describes the use of transfer learning to quickly construct several fine-grained image classification models of crop disease based on this attention mechanism. This study uses 58,200 crop leaf images as a dataset, including 14 different crops and 37 different categories of healthy/diseased crops. Among them, different diseases of the same crop have strong similarities. The NASNetLarge fine-grained classification model based on the proposed attention mechanism achieves the best classification effect, with an F1 score of up to 93.05%. The results show that the proposed attention mechanism effectively improves the fine-grained classification of crop disease images.
- Research Article
1
- 10.3390/electronics13091691
- Apr 27, 2024
- Electronics
Insect diversity monitoring is crucial for biological pest control in agriculture and forestry. Modern monitoring of insect species relies heavily on fine-grained image classification models. Fine-grained image classification faces challenges such as small inter-class differences and large intra-class variances, which are even more pronounced in insect scenes where insect species often exhibit significant morphological differences across multiple life stages. To address these challenges, we introduce segmentation and clustering operations into the image classification task and design a novel network model training framework for fine-grained classification of insect images using multi-modality clustering and approximate mask methods, named PCAM-Frame. In the first stage of the framework, we adopt the Polymorphic Clustering Module, and segmentation and clustering operations are employed to distinguish various morphologies of insects at different life stages, allowing the model to differentiate between samples at different life stages during training. The second stage consists of a feature extraction network, called Basenet, which can be any mainstream network that performs well in fine-grained image classification tasks, aiming to provide pre-classification confidence for the next stage. In the third stage, we apply the Approximate Masking Module to mask the common attention regions of the most likely classes and continuously adjust the convergence direction of the model during training using a Deviation Loss function. We apply PCAM-Frame with multiple classification networks as the Basenet in the second stage and conduct extensive experiments on the Insecta dataset of iNaturalist 2017 and IP102 dataset, achieving improvements of 2.2% and 1.4%, respectively. Generalization experiments on other fine-grained image classification datasets such as CUB200-2011 and Stanford Dogs also demonstrate positive effects. These experiments validate the pertinence and effectiveness of our framework PCAM-Frame in fine-grained image classification tasks under complex conditions, particularly in insect scenes.
- Research Article
- 10.1016/j.compbiolchem.2025.108496
- Oct 1, 2025
- Computational biology and chemistry
UB-Former: A fine-grained classification method for images of insects using biomorphic features.
- Research Article
9
- 10.3389/fnbot.2024.1391791
- May 3, 2024
- Frontiers in Neurorobotics
To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network's less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200-2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.
- Research Article
4
- 10.1145/3605892
- Aug 24, 2023
- ACM Transactions on Multimedia Computing, Communications, and Applications
Data augmentation is a common technique to improve the generalization performance of models for image classification. Although methods such as Mixup and CutMix that mix images randomly are indeed instrumental in general image classification, randomly swapping or masking regions is not friendly to fine-grained images, since the key to fine-grained image classification precisely lies in discriminative and informative regions, and it is unreasonable to generate labels solely consistent with the proportion of synthesis. Some erasing methods like Cutout even endanger fine-grained image classification because of erasing the discriminative regions by chance. In this article, we propose the Same Category Same Semantics Mixing method (S3Mix) corresponding to the characteristics of fine-grained images. Specifically, we limit the mixture to regions of the same category and semantics. The core of the method is two constraints. The exchange with the semantic region ensures the discrimination and semantics integrity of the generated image, and the exchange in the same class avoids the problem of unreasonable label generation. At the same time, we propose a homology loss to promote the semantic relationship between the generated positive image pairs. Experiments have been conducted on four fine-grained datasets, and the results show the proposed method is superior to the traditional image augmentation methods as well as some fine-grained data augmentation methods.
- Research Article
41
- 10.3390/app12189016
- Sep 8, 2022
- Applied Sciences
Thus far, few studies have been conducted on fine-grained classification tasks for the latest convolutional neural network ConvNeXt, and no effective optimization method has been made available. To achieve more accurate fine-grained classification, this paper proposes two attention embedding methods based on ConvNeXt network and designs a new bilinear CBAM; simultaneously, a multiscale, multi-perspective and all-around attention framework is proposed, which is then applied in ConvNeXt. Experimental verification shows that the accuracy rate of the improved ConvNeXt for fine-grained image classification reaches 87.8%, 91.2%, and 93.2% on fine-grained classification datasets CUB-200-2011, Stanford Cars, and FGVC Aircraft, respectively, showing increases of 2.7%, 0.3% and 0.4%, respectively, compared to those of the original network without optimization, and increases of 3.7%, 8.0% and 2.0%, respectively, compared to those of the traditional BCNN. In addition, ablation experiments are set up to verify the effectiveness of the proposed attention framework.
- Conference Article
- 10.1117/12.2643842
- Oct 9, 2022
With the growth of image classification models and the development of deep learning, the level of computer image classification has basically surpassed that of humans, but the level of fine-grained computer image classification is still weaker than that of humans, so fine-grained image classification is currently given a lot of attention. This paper carries out visualized analysis for the fine-grained image classification in terms of cooperation, scientific research, research trend and other aspects by CiteSpace software, and also points out that fine-grained image classification relies on CNN while looking into Transformer. However, fine-grained image classification has the problems of model reliance on prior knowledge (CNN), lack of sufficient data (Transformer), and lack of local relevance (CNN/Transformer), which can be solved by reducing the complexity of the model, transfer learning data augmentation, combination of CNN with Transformer, etc. On this basis, this paper indicates that Swin-Transformer has good generalization, hierarchy and translation invariance, and will become the main research direction for fine-grained image classification in the future.
- Research Article
5
- 10.3390/s21124176
- Jun 18, 2021
- Sensors (Basel, Switzerland)
Fine-grained image classification is a hot topic that has been widely studied recently. Many fine-grained image classification methods ignore misclassification information, which is important to improve classification accuracy. To make use of misclassification information, in this paper, we propose a novel fine-grained image classification method by exploring the misclassification information (FGMI) of prelearned models. For each class, we harvest the confusion information from several prelearned fine-grained image classification models. For one particular class, we select a number of classes which are likely to be misclassified with this class. The images of selected classes are then used to train classifiers. In this way, we can reduce the influence of irrelevant images to some extent. We use the misclassification information for all the classes by training a number of confusion classifiers. The outputs of these trained classifiers are combined to represent images and produce classifications. To evaluate the effectiveness of the proposed FGMI method, we conduct fine-grained classification experiments on several public image datasets. Experimental results prove the usefulness of the proposed method.
- Research Article
1
- 10.1142/s0218001423570124
- Nov 1, 2023
- International Journal of Pattern Recognition and Artificial Intelligence
The main differences in images of footprints are the proportion of the parts of foot and the distribution of pressure, which can be considered as fine-grained image classification. Moreover, the deviation of human body weight and muscle strength increases the difficulty of identifying the left and right feet. While using a fine-grained image classification network to solve the footprint image classification problem is certainly a feasible approach, the number of parameters in a fine-grained image classification network is generally large, and therefore we would like to build a lightweight classification network that is suitable for several small footprint datasets. In this paper, a multimodal footprint recognition algorithm based on progressive multi-granularity feature fusion is proposed. First, the shallow dense connection network is used to extract features. The feature extraction ability of the model is improved with the help of channel splicing and feature multiplexing. Second, to learn footprint images of different granularities, the progressive training strategy and puzzle scrambler are applied to the model. Finally, factorized bilinear coding can aggregate local features to obtain more discriminative global representation features. Experiments show that our network achieves comparable classification accuracy to some fine-grained image classification models (PMG, MSEC) on the complete pressure footprint dataset, but the number of parameters in our network is greatly reduced. Meanwhile, our network also achieves good classification results on several other footprint datasets, which demonstrates the effectiveness of our network. At the same time, an ablation experiment was carried out to verify the effectiveness of the progressive strategy and the factorized bilinear coding.
- Book Chapter
- 10.1007/978-3-030-92632-8_31
- Dec 16, 2021
In recent years, fine-grained image classification has been a new research field in computer vision due to the characteristics of significant intra-class differences and minor inter-class differences in fine-grained image classification tasks. Traditional image classification algorithms are still challenging to obtain good classification results despite relying on manual annotation. Since minor local differences can only distinguish the subcategories, accurate detection of local details is the key to improving fine-grained classification accuracy. Therefore, this paper proposes a joint detection network model of local feature points and components for fine-grained image classification to effectively predict and extract local feature positions. Experiments verify the effectiveness of the proposed method on the public data set CalTech-UCSD Birds (CUB 200-2011) for fine-granularity classification tasks.KeywordsLandmark and parts detectionSpatial transformer networksJoint detection modelBase model modificationImage fine-grained classification
- Conference Article
2
- 10.1109/iscid.2018.00015
- Dec 1, 2018
Fine-grained image classification is a challenging problem, due to the small inter-class variance caused by highly similar subordinate categories and large intra-class variance in poses, viewpoints and rotations. In this paper, we propose a novel end-to-end model for fine-grained image classification(FGIC). The proposed model consists of two sub-networks: detection sub-network and classification sub-network. The detection sub-network is constructed on the basis of R-FCN, and the classification sub-network contains a two-steam CNN for feature extraction and three fully connected layers for object classification. In addition, the network compression technology is adopted in both of the sub-networks to improve efficiency and reduce storage space. Experimental results on the CUB-200-2011 shows that the accuracy of our method is close to state-of-the-art with higher efficiency and lower storage requirement than the other compared methods (10 frames/sec during inference on TitanX). The proposed high-efficiency framework is believed to be effective in some of the practical applications, especially in the applications of mobile terminals.
- Research Article
6
- 10.25236/ajcis.2023.060215
- Jan 1, 2023
- Academic Journal of Computing & Information Science
Fine-grained image classification is a sub-category classification problem with a common superior category. Aiming at the characteristics of large intra-class differences and small inter-class differences in fine-grained images, this paper proposes a fine-grained image classification method based on multi-scale feature fusion. The method constructs a three-branch network model. The attention module and local extraction module are used to obtain the image of the target object and the image of the parts with strong distinguishing detail features. The depth metric learning is used to shorten the distance from the same data by using misclassification information to improve the classification accuracy; secondly, without using the image bounding box/partial annotation information, the image information of different scales is fused through a parallel network structure; finally, the entire network is optimized by combining the loss functions of the three-branch networks. This method performs end-to-end training collaboratively in a multi-branch network to enhance the ability to express information, thereby improving the accuracy of image classification. To evaluate the effectiveness of our method, fine-grained classification experiments were conducted on three datasets. The experimental results show that the algorithm has higher classification accuracy than other fine-grained classification algorithms.
- Research Article
5
- 10.1088/1742-6596/1754/1/012189
- Feb 1, 2021
- Journal of Physics: Conference Series
Extracting distinguished fine-grained features is essential for fine-grained image recognition tasks. Many researchers use expensive manual annotations to learn to distinguish part models, which may not be possible in practical applications. Unlike previous strongly supervised fine grained classification networks that require additional image annotations, weakly supervised fine grained image classification only requires label annotations. Recently, image enhancement has been increasingly used in network structures, but random enhancement will lead to background noise and filter out irrelevant areas. In this article, we propose a weakly supervised fine-grained image classification network based on attention-guided image enhancement to study the effect of image enhancement on the classification network. In detail, we use the backbone network to generate the feature map of the image, then generate the corresponding attention map through a custom mask, and use the attention map to guide the image enhancement process (including image cropping and image dropping). We conducted experiments on three commonly used fine-grained image classification datasets, and achieved sota effects in CUB, FGVC-Aircraft, and Stanford Cars.
- Conference Article
1
- 10.1109/icme51207.2021.9428135
- Jul 5, 2021
Fine-grained image classification has drawn increasing attention as it is much closer to practical applications than generic image classification. The majority of current fine-grained approaches locate the discriminative regions and leverage the features of these regions for classification as their magic weapons. However, these approaches simply ignore the internal semantic region correlation. As is well known, the correlation reveals the salient information of images, which can further boost the performance of fine-grained image classification. To this end, we propose an Object Decoupling with Graph Correlation network (ODGC) to explore the informative potentials of region correlation. A Responsive Object Location Module (ROLM) is first introduced to obtain the fine-grained object within a bounding box automatically. A Semantic Decoupling Module (SDM) then segments the object into different parts. ODGC learns the representations of these parts by transferring these part features into a Graph Correlation Module (GCM). Consists of these three main modules, ODGC is trained for fine-grained image classification in an end-to-end way. Extensive experiments conducted on CUB-200-2011 demonstrate that the aforementioned modules significantly improve the ODGC, and it achieves a new state-of-the-art performance to 88.2% top-1 accuracy. Besides, we collect a practical business e-commercial dataset, named Ecom-15K. The evaluation on it further validates the applicability of our method in practical scenarios.
- Conference Article
- 10.1109/ihmsc49165.2020.10133
- Aug 1, 2020
The difference between fine-grained image classification and general image classification is that the difference between fine-grained images is small, so in fine-grained image classification, the details of the images are extremely important. In this paper, we proposed a network that can retain both the overall image information and the local image information. Our network structure is composed as follows: First using the convolutional layer to obtain the feature map of the image, and then use the trilinear attention method to process the feature map to obtain the average attention map and the single-channel attention map, and then using selective sampling. The sampled image is obtained according to the two attention maps above, and finally the original image and the sampled two images are input to the convolutional neural network for discrimination. Our entire network can be trained end-to-end. We used this network structure to conduct a large number of experiments on the CUB-2011-200, FGVC aircraft and Stanford Cars datasets, and the experimental results all proved the effectiveness of the method.