Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
Recent advancements in learning algorithms have demonstrated that the sharpness of the loss surface is an effective measure for improving the generalization gap. Building upon this concept, Sharpness-Aware Minimization (SAM) was proposed to enhance model generalization and achieved state-of-the-art performance. SAM consists of two main steps, the weight perturbation step and the weight updating step. However, the perturbation in SAM is determined by only the gradient of the training loss, or cross-entropy loss. As the model approaches a stationary point, this gradient becomes small and oscillates, leading to inconsistent perturbation directions and also has a chance of diminishing the gradient. Our research introduces an innovative approach to further enhancing model generalization. We propose the Adaptive Adversarial Cross-Entropy (AACE) loss function to replace standard cross-entropy loss for SAM's perturbation. AACE loss and its gradient uniquely increase as the model nears convergence, ensuring consistent perturbation direction and addressing the gradient diminishing issue. Additionally, a novel perturbation-generating function utilizing AACE loss without normalization is proposed, enhancing the model's exploratory capabilities in near-optimum stages. Empirical testing confirms the effectiveness of AACE, with experiments demonstrating improved performance in image classification tasks using Wide ResNet and PyramidNet across various datasets. The reproduction code is available online 1 .
- Book Chapter
1
- 10.1007/978-3-030-61401-0_11
- Jan 1, 2020
Object detection is an important and fundamental task in computer vision. Recently, the emergence of deep neural network has made considerable progress in object detection. Deep neural network object detectors can be grouped in two broad categories: the two-stage detector and the one-stage detector. One-stage detectors are faster than two-stage detectors. However, they suffer from a severe foreground-backg-round class imbalance during training that causes a low accuracy performance. RetinaNet is a one-stage detector with a novel loss function named Focal Loss which can reduce the class imbalance effect. Thereby RetinaNet outperforms all the two-stage and one-stage detectors in term of accuracy. The main idea of focal loss is to add a modulating factor to rectify the cross-entropy loss, which down-weights the loss of easy examples during training and thus focuses on the hard examples. However, cross-entropy loss only focuses on the loss of the ground-truth classes and thus it can’t gain the loss feedback from the false classes. Thereby cross-entropy loss does not achieve the best convergence. In this paper, we proposed a new loss function named Dual Cross-Entropy Focal Loss, which improves on the focal loss. Dual cross-entropy focal loss adds a modulating factor to rectify the dual cross-entropy loss towards focusing on the hard samples. Dual cross-entropy loss is an improved variant of cross-entropy loss, which gains the loss feedback from both the ground-truth classes and the false classes. We changed the loss function of RetinaNet from focal loss to our dual cross-entropy focal loss and performed some experiments on a small vehicle dataset. The experimental results show that our new loss function improves the vehicle detection performance.
- Conference Article
134
- 10.24963/ijcai.2020/305
- Jul 1, 2020
Trained with the standard cross entropy loss, deep neural networks can achieve great performance on correctly labeled data. However, if the training data is corrupted with label noise, deep models tend to overfit the noisy labels, thereby achieving poor generation performance. To remedy this issue, several loss functions have been proposed and demonstrated to be robust to label noise. Although most of the robust loss functions stem from Categorical Cross Entropy (CCE) loss, they fail to embody the intrinsic relationships between CCE and other loss functions. In this paper, we propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise. Specifically, our framework enables to weight the extent of fitting the training labels by controlling the order of Taylor Series for CCE, hence it can be robust to label noise. In addition, our framework clearly reveals the intrinsic relationships between CCE and other loss functions, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE). Moreover, we present a detailed theoretical analysis to certify the robustness of this framework. Extensive experimental results on benchmark datasets demonstrate that our proposed approach significantly outperforms the state-of-the-art counterparts.
- Book Chapter
92
- 10.1007/978-3-030-32239-7_42
- Jan 1, 2019
Image segmentation plays an important role in pathology image analysis as the accurate separation of nuclei or glands is crucial for cancer diagnosis and other clinical analyses. The networks and cross entropy loss in current deep learning-based segmentation methods originate from image classification tasks and have drawbacks for segmentation. In this paper, we propose a full resolution convolutional neural network (FullNet) that maintains full resolution feature maps to improve the localization accuracy. We also propose a variance constrained cross entropy (varCE) loss that encourages the network to learn the spatial relationship between pixels in the same instance. Experiments on a nuclei segmentation dataset and the 2015 MICCAI Gland Segmentation Challenge dataset show that the proposed FullNet with the varCE loss achieves state-of-the-art performance. The code is publicly available (https://github.com/huiqu18/FullNet-varCE).
- Research Article
5
- 10.1109/bibm55620.2022.9995469
- Dec 6, 2022
- Proceedings. IEEE International Conference on Bioinformatics and Biomedicine
Although cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structure when the resolution of cryo-EM density maps is in the medium resolution range such as 5-10 Å. Although multiple neural networks have been proposed for the problem of secondary structure detection from cryo-EM 3D images, loss functions used in the existing networks are primarily based on cross entropy loss (CE). In order to study the behavior of various loss functions in the secondary structure detection problem, we investigated five loss functions and compared their performances. Using a U-net architecture in DeepSSETracer and a test set of 65 protein chains of atomic structures and their corresponding cryo-EM density component maps, we found that the combined function with focal cross entropy loss (FCE) and Dice loss (DL) provides the best overall detection of secondary structures. In particular, the combined loss function has a significant enhancement of an overall F1 score of 6.7% when compared to CE in detection of β-sheet voxels that are generally much harder to be detected accurately than for helix voxels. Our work shows the potential of designing effective loss functions to enhance the detection of hard cases in the segmentation of secondary structure problem.
- Research Article
8
- 10.1609/aaai.v37i6.25819
- Jun 26, 2023
- Proceedings of the AAAI Conference on Artificial Intelligence
Cross entropy loss has served as the main objective function for classification-based tasks. Widely deployed for learning neural network classifiers, it shows both effectiveness and a probabilistic interpretation. Recently, after the success of self supervised contrastive representation learning methods, supervised contrastive methods have been proposed to learn representations and have shown superior and more robust performance, compared to solely training with cross entropy loss. However, cross entropy loss is still needed to train the final classification layer. In this work, we investigate the possibility of learning both the representation and the classifier using one objective function that combines the robustness of contrastive learning and the probabilistic interpretation of cross entropy loss. First, we revisit a previously proposed contrastive-based objective function that approximates cross entropy loss and present a simple extension to learn the classifier jointly. Second, we propose a new version of the supervised contrastive training that learns jointly the parameters of the classifier and the backbone of the network. We empirically show that these proposed objective functions demonstrate state-of-the-art performance and show a significant improvement over the standard cross entropy loss with more training stability and robustness in various challenging settings.
- Front Matter
8
- 10.1016/j.oooo.2022.07.004
- Jul 15, 2022
- Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology
Can Artificial Intelligence (AI) assist in the diagnosis of oral mucosal lesions and/or oral cancer?
- Conference Article
1
- 10.15439/2022f185
- Sep 26, 2022
We introduce a new loss function TripleEntropy, to improve classification performance for fine-tuning general knowledge pre-trained language models based on cross-entropy and SoftTriple loss. This loss function can improve the robust RoBERTa baseline model fine-tuned with cross-entropy loss by about (0.02% - 2.29%). Thorough tests on popular datasets indicate a steady gain. The fewer samples in the training dataset, the higher gain -- thus, for small-sized dataset it is 0.78%, for medium-sized -- 0.86% for large -- 0.20% and for extra-large 0.04%.
- Conference Article
29
- 10.1109/iccv48922.2021.00163
- Oct 1, 2021
Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions. Existing approaches address this by augmenting the dataset with question paraphrases from visual question generation models or adversarial perturbations. These approaches use the combined data to learn an answer classifier by minimizing the standard cross-entropy loss. To more effectively leverage augmented data, we build on the recent success in contrastive learning. We propose a novel training paradigm (ConClaT) that optimizes both cross-entropy and contrastive losses. The contrastive loss encourages representations to be robust to linguistic variations in questions while the cross-entropy loss preserves the discriminative power of representations for answer prediction.We find that optimizing both losses – either alternately or jointly – is key to effective training. On the VQA-Rephrasings [44] benchmark, which measures the VQA model’s answer consistency across human paraphrases of a question, ConClaT improves Consensus Score by 1.63% over an improved baseline. In addition, on the standard VQA 2.0 benchmark, we improve the VQA accuracy by 0.78% overall. We also show that ConClaT is agnostic to the type of data-augmentation strategy used.
- Conference Article
5
- 10.1145/3431943.3432285
- Oct 16, 2020
Automated Lymph Node (LN) detection and segmentation are essential for cancer staging. Positron emission tomography (PET) and computed tomography (CT) imaging are routinely used to detect pathological LNs in clinical. Yet, it is still a difficult task for LN segmentation owing to its low contrast as well as surrounding soft tissues and the variation in nodal size and shape. Deep convolutional neural networks have been widely employed to segment objects in medical images, which choice cross-entropy as loss function. However, it did not consider the severe class imbalance between pathological LNs and the background. Keeping this in mind, we, firstly, present a novel boundary-aware cross-entropy (BCE) loss function, which could up-weight the boundary voxels of LNs. Moreover, we investigate the behavior of multiple loss functions for LNs segmentation, such as cross-entropy loss (CE), focal loss (FL), and generalized Dice loss (GDL). Lastly, we propose a novel strategy that combines BCE, CE and FL loss function with GDL respectively, which could exploit the class re-balancing properties of the GDL for imbalanced category labels between LNs and background. We find that combination of BCE loss function with GDL could alleviate the problem of imbalance of category labels. Four-fold cross validations have been done on 63 volumes containing 214 malignant lymph nodes shows that the combination of BCE loss function with GDL achieved the sensitivity 90% and 85%, and Dice 75% and 77% on SegNet and DeepLabv3+ architecture respectively.
- Conference Article
9
- 10.1109/bigmm.2018.8499463
- Sep 1, 2018
Person re-identification is the process of recognizing a person through a network of cameras. Recently, many models of person re-identification based on deep learning have been proposed. In these models, the choice of loss function is vital, since different loss function has different characteristics. Cross-entropy and triplet losses are two commonly used loss functions. Unfortunately, triplet loss cannot measure the overall spatial distribution of features, while the cross-entropy loss does not have enough discriminant between features. In this paper, we propose a new hybrid loss function to learn a better spatial distribution of features and distance between features. Furthermore, we design a strategy to mine hard triplets to accelerate the learning. Experimental results demonstrate that the proposed method is effective and improves the accuracy of person re-identification when compared with the state-of-the-art.
- Book Chapter
1
- 10.1007/978-3-030-36808-1_12
- Jan 1, 2019
Deep convolutional neural networks (CNNs) have recently achieved great improvements in salient object detection. Most existing CNN-based models adopt cross entropy loss to optimize the networks for its capability in probability prediction. The function of cross entropy loss in salient object detection can be seemed as a pixel-wise label classification for images, which automatically predict whether the pixel is salient or non-salient. However, cross entropy loss pays attention to each single pixel of image when classifying the label, which doesn’t consider the relationship with other pixels. In this paper, we propose an additional loss function, called group loss, to improve the above limitation of cross entropy loss. In our model, group loss as well as cross entropy loss work together to optimize the network for better saliency detection performance. The purpose of group loss is to make the difference between salient pixels smaller while the distance between salient and non-salient pixels as large as possible. Meanwhile, due to the large computation cost of pixel-wise comparisons, we design a superpixel pooling layer for computing group loss with no additional parameters, which converts the computation of group loss to superpixel level. The experimental results show that the introduction of group loss improves the performance of CNN network in salient object detection, which makes the boundaries of salient objects more distinct.
- Research Article
6
- 10.3390/electronics10040431
- Feb 10, 2021
- Electronics
Medical image segmentation has gained greater attention over the past decade, especially in the field of image-guided surgery. Here, robust, accurate and fast segmentation tools are important for planning and navigation. In this work, we explore the Convolutional Neural Network (CNN) based approaches for multi-dataset segmentation from CT examinations. We hypothesize that selection of certain parameters in the network architecture design critically influence the segmentation results. We have employed two different CNN architectures, 3D-UNet and VGG-16, given that both networks are well accepted in the medical domain for segmentation tasks. In order to understand the efficiency of different parameter choices, we have adopted two different approaches. The first one combines different weight initialization schemes with different activation functions, whereas the second approach combines different weight initialization methods with a set of loss functions and optimizers. For evaluation, the 3D-UNet was trained with the Medical Segmentation Decathlon dataset and VGG-16 using LiTS data. The quality assessment done using eight quantitative metrics enhances the probability of using our proposed strategies for enhancing the segmentation results. Following a systematic approach in the evaluation of the results, we propose a few strategies that can be adopted for obtaining good segmentation results. Both of the architectures used in this work were selected on the basis of general acceptance in segmentation tasks for medical images based on their promising results compared to other state-of-the art networks. The highest Dice score obtained in 3D-UNet for the liver, pancreas and cardiac data was 0.897, 0.691 and 0.892. In the case of VGG-16, it was solely developed to work with liver data and delivered a Dice score of 0.921. From all the experiments conducted, we observed that two of the combinations with Xavier weight initialization (also known as Glorot), Adam optimiser, Cross Entropy loss (GloCEAdam) and LeCun weight initialization, cross entropy loss and Adam optimiser LecCEAdam worked best for most of the metrics in a 3D-UNet setting, while Xavier together with cross entropy loss and Tanh activation function (GloCEtanh) worked best for the VGG-16 network. Here, the parameter combinations are proposed on the basis of their contributions in obtaining optimal outcomes in segmentation evaluations. Moreover, we discuss that the preliminary evaluation results show that these parameters could later on be used for gaining more insights into model convergence and optimal solutions.The results from the quality assessment metrics and the statistical analysis validate our conclusions and we propose that the presented work can be used as a guide in choosing parameters for the best possible segmentation results for future works.
- Conference Article
- 10.1109/niss55057.2022.10085296
- Mar 30, 2022
Speech enhancement is often applied for speech-based systems due to the proneness of speech signals to additive background noise. While speech processing-based methods are traditionally used for speech enhancement, with advancements in deep learning technologies, many efforts have been made to implement them for speech enhancement. Using deep learning, the networks learn mapping functions from noisy data to clean ones and then learn to reconstruct the clean speech signals. As a consequence, deep learning methods can reduce what is so-called musical noise that is often found in traditional speech enhancement methods. Currently, one popular deep learning architecture for speech enhancement is generative adversarial networks (GAN). However, the cross-entropy loss that is employed in GAN often causes the training to be unstable. So, in many implementations of GAN, the cross-entropy loss is replaced with the least-square loss. In this paper, to improve the training stability of GAN using cross-entropy loss, we propose to use deep regret analytic generative adversarial networks (Dragan) for speech enhancements. It is based on applying a gradient penalty on cross-entropy loss. We also employ relativistic rules to stabilize the training of GAN. Then, we applied it to the least square and Dragan losses. Our experiments suggest that the proposed method improve the quality of speech better than the least-square loss on several objective quality metrics.
- Research Article
14
- 10.1016/j.jspi.2024.106188
- Jun 5, 2024
- Journal of Statistical Planning and Inference
Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper, we aim to fill this gap by analysing the rate of the excess risk of a CNN classifier trained by cross-entropy loss. Under suitable assumptions on the smoothness and structure of the a posteriori probability, it is shown that these classifiers achieve a rate of convergence which is independent of the dimension of the image. These rates are in line with the practical observations about CNNs.
- Research Article
45
- 10.1016/j.apacoust.2020.107740
- Nov 21, 2020
- Applied Acoustics
Recognition of imbalanced underwater acoustic datasets with exponentially weighted cross-entropy loss