Online Anchor-Based Training For Image Classification Tasks
In this paper, we aim to improve the performance of a deep learning model towards image classification tasks, proposing a novel anchor-based training methodology, named Online Anchor-based Training (OAT). The OAT method, guided by the insights provided in the anchor-based object detection methodologies, instead of learning directly the class labels, proposes to train a model to learn percentage changes of the class labels with respect to defined anchors. We define as anchors the batch centers at the output of the model. Then, during the test phase, the predictions are converted back to the original class label space, and the performance is evaluated. The effectiveness of the OAT method is validated on four datasets.
- Book Chapter
2
- 10.1007/978-3-030-89363-7_34
- Jan 1, 2021
With the development of artificial intelligence technology, optimizing the performance of deep neural network model has become a hot issue in the field of scientific research. Learning rate is one of the most important hyper-parameters for model optimization. In recent years, some learning rate algorithms with cycle mechanism have been proposed. Most of them adopt warm restart and cycle mechanism to make the learning rate value cyclically change between two boundary values and prove their effectiveness by practicing in image classification task. In order to further improve the performance of neural network model and prove the effectiveness in different training task, the paper proposes a novel learning rate schedule called hyperbolic tangent polynomial parity cyclic learning rate (HTPPC), which adopts cycle mechanism and combines the advantages of warm restart and polynomial decay. In addition, the performance of HTPPC is demonstrated on image classification and object detection tasks.
- Research Article
- 10.11834/jig.211182
- Jan 1, 2023
- Journal of Image and Graphics
目的 在图像分类领域,小样本学习旨在利用从大规模数据集中训练到的知识来处理仅包含少量有标记训练样本的下游分类任务。通常情况下,下游任务只涉及新类样本,由于元训练阶段会构造大量任务随机抽取训练集中不同类别的样本且训练集与测试集类别间存在领域间隙,因此模型训练周期长且可能对训练集过拟合,以致元知识无法迁移到测试集,进而导致模型泛化性差。针对以上问题,提出一种多层自适应聚合的自监督小样本图像分类模型。方法 首先使用分组卷积对残差块进行改进,减少神经网络参数量,降低训练难度,缩短训练时间;然后采用多层自适应聚合的方法改进骨干网络,对网络各层语义信息加以提炼聚合,自适应分配各层权重,将聚合后的特征图作为后续分类的依据;最后加入自监督对比学习结合有监督学习挖掘样本自身潜在的信息,从而提升样本特征表达能力。结果 在mini-ImageNet数据集和CUB(Caltech-UCSD birds-200-2011)数据集上与当前主流模型进行分类效果对比实验,与baseline相比,所提模型的准确率在mini-ImageNet数据集的5-way 1-shot与5-way 5-shot实验上分别提升了6.31%和6.04%,在CUB数据集的5-way 1-shot与5-way 5-shot实验上分别提升了8.95%和8.77%。结论 本文模型能在一定程度上缩短训练时间、增强样本特征表达能力和优化数据分布,并缓解领域间隙所带来的问题,从而提高模型泛化性与分类效果。;Objective The emerging deep learning technique has facilitated such artificial intelligence (AI)-related domains like image classification,natural language processing,speech recognition,and reinforcement learning. However, it is being challenged for the over-fitting problem of the models. Effective data is required to be obtained from mass,especially in the tasks of image classification. To tackle this problem,the concept of few-shot learning is developed,which aims at well-generalized knowledge learned from a large-scale dataset to handle downstream classification tasks with little training samples. Currently,most popular methods for few-shot image classification are based on meta-learning,which can be learnt to deal with few-shot tasks via similar classification tasks. The process of meta-learning is divided into two steps: 1)meta training,and 2)meta testing. For the meta training,an embedding network is trained by the meta-training set, and it is used to tackle a few training data-constructed downstream classification tasks from the meta-testing set. There is no intersection between meta-training set and meta-testing set,which means that no reliable prior knowledge is obtained by meta-learner in the meta-training process. Due to category differences between meta-training and meta-testing,some new challenges to meta-learning models are required to be resolved. If it focuses on training tasks only,the effectiveness of models will be affected when the meta-learner meets with the few-shot tasks with brand-new categories. To tackle this challenge with metric-based methods,we develop a multi-layer adaptive aggregation self-supervised few-shot classification model. Method First,to reduce the parameters of the backbone and lower the training difficulty,a group of convolution blocks are used to replace the original convolution. Next,to improve the backbone,the multi-layer adaptive aggregation module is illustrated,which can refine the information of each network layer dynamically and balance the weights of each layer adaptively via the aggregated feature maps of those are the basis for subsequent downstream few-shot classification. Finally,to enhance the transferability of the learned model,the self-supervised contrastive learning is introduced to assist supervised learning to mine the potential information of the data themselves. It does not suffer from over-fitting because contrastive learning is not required to be supervised. It can be as an additional source of regularization as well,which is beneficial for the construction of feature space. The embedding networks can be paid more attention to learn the well-generalized knowledge in terms of the proposed self-supervised contrastive learning method,which makes the distribution of embedding feature maps smoother and the classification model is more suitable for the domain of downstream tasks. Result To validate the effectiveness of the proposed model,comparative analysis is carried out with some popular models,including 1)prototype network, 2)relation network,3)cosine classifier,as well as the 4)mini-ImageNet dataset and 5)Caltech-UCSD birds-200-2011 (CUB)dataset. For the mini-ImageNet dataset,each accuracy of the proposed model can be reached to 63. 13% on 5-way 1-shot and 78. 14% on 5-way 5-shot,it can be optimized by 13. 71% and 9. 94% each for the original Prototype network. For the fine-grained CUB dataset,the accuracies of the proposed model can be reached to 75. 93% on 5-way 1-shot and 87. 56% on 5-way 5-shot,which are 24. 48% and 13. 05% higher than the original Prototype network of each. Compared to the baseline on 5-way 1-shot and 5-way 5-shot,each of model accuracy is increased by 6. 31%,6. 04% on mini-ImageNet, and they are increased by 8. 95%,8. 77% of each as well on CUB. The comparative experiments demonstrate that the parameters of our backbone are optimized in comparison with the parameters of 5 backbones on Prototype network. A couple of ablation experiments are also conducted to verify the proposed model. Additionally,the heat maps-related contrastive experiment between baseline and the proposed model verifies that our model can prevent the embedding network from more background information of images and alleviate the interference of downstream classification tasks. Furthermore,the t-SNE method is used for visualization to sort the distribution of samples out in the feature space. The obtained feature distribution of t-SNE visualization experiment on CUB dataset can demonstrate that our model is capable of differentiating samples well from different categories,which can make the meta-testing set linearly separable. Conclusion To resolve some problems in the field of few-shot learning,we develop a multi-layer adaptive aggregation self-supervised few-shot classification model. To alleviate the problem of training difficulty,the improved group convolution can be used to reduce the parameters of backbone. To optimize over-fitting and domain gap,the multi-layer adaptive aggregation method and the self-supervised contrastive learning method are used to adjust the distribution of embedding feature maps. In particular,the embedding networks are not be affected by the background-redundant of images based on our self-supervised contrastive learning method.
- Book Chapter
2
- 10.1137/1.9781611977653.ch88
- Jan 1, 2023
Meta-learning has shown great promise in few-shot image classification where only a small amount of labeled data is available in each classification task. Many training tasks are provided to train a meta-model that can quickly learn new and similar concepts with few labeled samples. Data augmentation is often used to augment training tasks to avoid overfitting. However, existing data augmentation methods are often manually designed and fixed during training, ignoring training dynamics and the difference between various meta-learning settings specified by meta-model architectures and meta-learning algorithms. To address this problem, we add a task transformation layer between a training task and a meta-model such that the right amount of perturbation is added to training tasks for a certain meta-learning setting at a certain training stage. By jointly optimizing the task transformation layer and the meta-model, we avoid the risk of providing tasks that are either too easy or too difficult during training. We design the task transformation layer as a stochastic transformation function, adding the flexibility in how a training task can be transformed. We leverage differentiable data augmentations as the building blocks of the task transformation function for efficient optimization. Extensive experiments show that our method can consistently improve the few-shot generalization performance of various meta-models trained with different meta-learning algorithms, meta-model architectures, and datasets.
- Research Article
18
- 10.1109/lgrs.2021.3107321
- Jan 1, 2022
- IEEE Geoscience and Remote Sensing Letters
Deep learning methods have made considerable progress in many fields, but most of them rely on a large amount of sample. In the hyperspectral image (HSI) classification task, many unlabeled data and few labeled data exist, so it is necessary to use a small number of training samples to achieve good results. In this letter, in order to fuse spectral and spatial information, a dual-branch residual neural network (ResNet) is proposed, with one branch for extracting spectral features and one branch for extracting patch features. Further, according to the properties of the HSI, self-supervised learning training methods are designed for these two branches. When spectral information is used for training, the image is artificially divided into several parts, with each part being a category for the classification task. When patch features are used for training, the task is to recover the spectral information of the intermediate pixels. After the pretext task training is completed, a pre-training weight will be provided for classification task training. Experiments with a small number of samples of two public datasets show that this method has better classification performance than existing methods.
- Research Article
6
- 10.48084/etasr.6127
- Aug 9, 2023
- Engineering, Technology & Applied Science Research
Classification of medical images plays an indispensable role in medical treatment and training tasks. Much effort and time are required in the extraction and selection of classification features of medical images. Deep Neural Networks (DNNs) are an evolving Machine Learning (ML) method that has proved its ability in various classification tasks. Convolutional Neural Networks (CNNs) present the optimal results for changing image classification tasks. In this regard, this study focused on developing a Multi-versus Optimizer with Deep Learning Enabled Robust Medical X-ray Image Classification (MVODL-RMXIC) method, aiming to identify abnormalities in medical X-ray images. The MVODL-RMXIC model used the Cross Bilateral Filtering (CBF) technique for noise removal, a MixNet feature extractor with an MVO algorithm based on hyperparameter optimization, and Bidirectional Long-Short-Term Memory (BiLSTM) for image classification. The proposed MVODL-RMXIC model was simulated and evaluated, showing its efficiency over other current methods.
- Conference Article
1
- 10.1109/eiecs53707.2021.9588058
- Sep 23, 2021
Traditional neural networks need a lot of data set training in feature extraction and image classification tasks, and classification tasks are often not completed in small sample tasks. This paper uses transfer learning methods to design the traditional network VGGNet-16 to avoid the need to train the network from scratch Complexity. According to the classification target, VGGNet-16 is fine-tuned to obtain the network used for this classification task. The effect of different Dropout values on the model is verified in the fully connected layer. The results show that migration learning has excellent training effects for small data samples and reduces training costs. After adding the Dropout layer, the performance of the network model is not only improved by 1.1 %, but also more stable than the network model without the Dropout layer. The data shows that the network model is more stable when the dropout is 0.5 and the batchsize is 32, and the accuracy rate reaches 97.10%.
- Book Chapter
1
- 10.1007/978-3-030-31756-0_7
- Oct 30, 2019
The success of deep convolutional networks on image and text classification and recognition tasks depends on the availability of large, correctly labeled training datasets, but obtaining the correct labels for these gigantic datasets is very difficult task. To deal with this problem, we describe an approach for learning deep networks from datasets corrupted by unknown label noise. We append a nonlinear noise model to a standard deep network, which is learned in tandem with the parameters of the network. Further, we train the network using a loss function that encourages the clustering of training images. We argue that the non-linear noise model, while not rigorous as a probabilistic model, results in a more effective denoising operator during backpropagation. We evaluate the performance of proposed approach on image classification task with artificially injected label noise to MNIST, CIFAR-10, CIFAR-100 and ImageNet datasets and on a large-scale Clothing 1M dataset with inherent label noise. Further, we show that with the different initialization and the regularization of the noise model, we can apply this learning procedure to text classification tasks as well. We evaluate the performance of modified approach on TREC text classification dataset. On all these datasets, the proposed approach provides significantly improved classification performance over the state of the art and is robust to the amount of label noise and the training samples. This approach is computationally fast, completely parallelizable, and easily implemented with existing machine learning libraries.
- Book Chapter
2
- 10.1007/978-3-540-74999-8_79
- Jan 1, 2007
Content based automatic image classification systems are increasingly finding usage, e.g. in large medical image databases. This paper concentrates on a grayscale radiograph annotation task which was a part of the ImageCLEF 2006. We use local features calculated around interest points, which have recently received excellent results for various image recognition and classification tasks. We propose the use of relational features, which are highly robust to illumination changes, and thus quite suitable for X-Ray images. Results with various feature and classifier settings are reported. A significant improvement in results is seen when the relative positions of the interest points are also taken into account during matching. For the given test set, our best run had a classification error rate of 16.7 %, just 0.5 % higher than the best overall submission, and therewith was ranked second in the medical automatic annotation task at the ImageCLEF 2006. The proposed method is general, can be applied to other image classification tasks and can also be extended to colour images.KeywordsLocal FeaturesRadiographImage AnnotationInvariants
- Research Article
12
- 10.3390/biomimetics7030084
- Jun 23, 2022
- Biomimetics
Deep Convolutional Neural Networks (CNNs) represent the state-of-the-art artificially intelligent computing models for image classification. The advanced cognition and pattern recognition abilities possessed by humans are ascribed to the intricate and complex neurological connection in human brains. CNNs are inspired by the neurological structure of the human brain and show performance at par with humans in image recognition and classification tasks. On the lower extreme of the neurological complexity spectrum lie small organisms such as insects and worms, with simple brain structures and limited cognition abilities, pattern recognition, and intelligent decision-making abilities. However, billions of years of evolution guided by natural selection have imparted basic survival instincts, which appear as an “intelligent behavior”. In this paper, we put forward the evidence that a simple algorithm inspired by the behavior of a beetle (an insect) can fool CNNs in image classification tasks by just perturbing a single pixel. The proposed algorithm accomplishes this in a computationally efficient manner as compared to the other adversarial attacking algorithms proposed in the literature. The novel feature of the proposed algorithm as compared to other metaheuristics approaches for fooling a neural network, is that it mimics the behavior of a single beetle and requires fewer search particles. On the contrary, other metaheuristic algorithms rely on the social or swarming behavior of the organisms, requiring a large population of search particles. We evaluated the performance of the proposed algorithm on LeNet-5 and ResNet architecture using the CIFAR-10 dataset. The results show a high success rate for the proposed algorithms. The proposed strategy raises a concern about the robustness and security aspects of artificially intelligent learning systems.
- Research Article
6
- 10.1109/tccn.2021.3074908
- Apr 24, 2021
- IEEE Transactions on Cognitive Communications and Networking
This paper proposes a communication strategy for decentralized learning in wireless systems that employs adaptive modulation and coding capability. The main objective of this work is to address a critical issue in decentralized learning based on the cooperative stochastic gradient descent (C-SGD) over wireless systems: the relationship between the transmission rate and the network density influences the runtime performance of learning. We first present that a dense network topology does not necessarily benefit the iteration performance of learning than a sparse one. However, it tends to degrade the runtime performance because the dense network topology requires a low-rate transmission. Based on these findings, a communication strategy is proposed in which each node optimizes its transmission rate to minimize communication time during the C-SGD under the constraints of network density. We perform numerical simulations of an image classification task under both independent and identically distributed (i.i.d.) and non-i.i.d. settings. The simulation results reveal that the preferred setting for the network density depends on the channel conditions and the biases in the training samples. Furthermore, numerical simulations of an automatic modulation classification task indicate that the preferred setting is almost the same even if the training task is different.
- Conference Article
- 10.1109/isesd.2017.8253301
- Oct 1, 2017
This paper presents an efficient and effective way on computing the Local Binary Pattern (LBP) feature from the halftone image for the image retrieval and classification tasks. The Ordered Dither Block Truncation Coding (ODBTC) compresses an image into two new representations, i.e. color quantizer and halftone image. Two image features can be generated from these two new representations for computing similarity degree between several images in the image retrieval and classification processes. Color Histogram Feature (CHF) can be easily computed from color quantizer, whereas the Block-based Local Binary Pattern (BLBP) can be directly applied on halftone image. The feature extraction process avoids the ODBTC decoding step making it very useful in real time application requiring fast feature computation. As documented in the experimental result, the proposed method offers a promising result on the image classification and retrieval tasks compared to that of the former schemes.
- Research Article
4
- 10.1007/s10589-024-00580-w
- May 31, 2024
- Computational Optimization and Applications
In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies that yield noisy approximations of the finite sum objective function and its gradient. We introduce an adaptive sample size strategy based on inexpensive additional sampling to control the resulting approximation error. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical experiments showing that the proposed algorithm outperforms its state-of-the-art counterpart in deep neural network training for image classification and regression tasks while requiring a significantly smaller number of gradient evaluations.
- Research Article
- 10.47065/bits.v6i1.5418
- Jun 30, 2024
- Building of Informatics, Technology and Science (BITS)
Deep learning-based shrimp image processing has become a rapidly growing research field in recent years. This technology aims to increase efficiency and accuracy in various applications related to the fishing and aquaculture industry, such as monitoring shrimp health, disease detection, species classification, and assessing the quality and quantity of harvested crops. Based on observations to date, fish sellers and buyers in the market have difficulty distinguishing vaname shrimp cultivated in tarpaulin ponds and earthen ponds. This research aims to apply deep learning techniques to determine the classification of Litopenaeus vannamei shrimp cultivation results in earthen ponds and tarpaulin ponds. To facilitate this research, the author uses a classification method by applying two Convolutional Neural Network (CNN) architectures, namely Visual Geometry Group-16 (VGG-16) and Residual Network-50 (ResNet-50). The dataset used in this research is 2,080 images per class of vannamei shrimp from two types of shrimp ponds. The results of this research are learning rates of 0.001 and 0.0001 on the Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (ADAM) optimizer to evaluate their effectiveness in model training. The VGG-16 and ResNet-50 models were trained with a learning rate parameter of 0.0001, taking advantage of the flexibility and reasonable control provided by the SGD optimizer. Lower learning rate values were chosen to prevent overfitting and increase training stability. Model evaluation showed promising results, with both architectures achieving 100% accuracy in classifying vannamei shrimp from ground and tarpaulin ponds. The conclusion of this research is to highlight the superiority of using SGD with a learning rate of 0.0001 versus 0.001 on both architectures, then the significant impact of optimizer selection and learning rate on the effectiveness of model training in image classification tasks
- Research Article
13
- 10.1016/j.neucom.2015.10.053
- Nov 10, 2015
- Neurocomputing
Improving the BoVW via discriminative visual n-grams and MKL strategies
- Book Chapter
3
- 10.1007/978-3-030-75793-9_83
- Jan 1, 2021
Convolutional Neural Network (CNN) is a deep learning model which has been an active research topic and applied extensively to vibration data for condition monitoring (CM). In CNN, hyper-parameters, such as activation function, have a significant effect on the training task and, consequently, on the overall performance of the network. The existing activation functions have some limitations, such as vanishing gradient problem, dead neurons, and fixed gradient value. In order to address the reported issues, this paper proposes an improved activation function for deep CNN, namely (IReLU-Tanh). It adopts the advantage of ReLU function in covering the positive region, also by taking the properties of the negative region from the Tanh function. Therefore, the proposed IReLU-Tanh function addresses the existing shortcomings, both vanishing gradient, dead neurons, and fixed gradient value. To prove its effectiveness, the proposed IReLU-Tanh function is evaluated based on both simulated and experimental vibration data. Results show that the proposed IReLU-Tanh function enhances remarkably the overall performance of the network in two aspects; firstly, in training task, the model parameters can reach the optimum values with lower learning errors compared to other functions, so the network can learn effectively the hidden features. Secondly, it improves the overall accuracy of the classification task and yields robust detection and diagnosis performance when compared against the other activation functions including Tanh, ReLU, LReLU, and ELU.