Redefining Visual Quality: The Impact of Loss Functions on INR-Based Image Compression
Implicit Neural Representations (INR) are a novel data representation technique which is gaining ground in the image compression field due to its simplicity and interesting results in terms of rate/distortion ratio. Although a variety of methods based on this paradigm were proposed, limited interest has been given to the analysis of the loss function and the impact of compression artifacts on the visual quality of the reconstructed images, which are mainly due to the adoption of the simple Mean Squared Error (MSE) loss function and to the evaluation done merely in terms of Peak Signal-to-Noise Ratio (PSNR), which do not often correlate with the human perception. In this paper, we evaluate a set of five loss functions in the context of training INRs for image compression, applied to three state-of-the-art architectures, and evaluate their effect on a broader collection of quantitative metrics and the visual fidelity of the decoded images to the originals. The presented outcomes show that the reconstructions obtained by training with some loss functions as MSE suffer from over-smoothing and aliasing artifacts. Our findings reveal that through the employing of a suitable loss function, state-of-the-art architectures quantitatively and qualitatively outperform the results reported in their original papers.
- Book Chapter
2
- 10.1007/978-3-030-39770-8_9
- Jan 1, 2020
This work investigates the impact of the loss function on the performance of Neural Networks, in the context of a monocular, RGB-only, image localization task. A common technique used when regressing a camera’s pose from an image is to formulate the loss as a linear combination of positional and rotational mean squared error (using tuned hyperparameters as coefficients). In this work we observe that changes to rotation and position mutually affect the captured image, and in order to improve performance, a pose regression network’s loss function should include a term which combines the error of both of these coupled quantities. Based on task specific observations and experimental tuning, we present said loss term, and create a new model by appending this loss term to the loss function of the pre-existing pose regression network ‘PoseNet’. We achieve improvements in the localization accuracy of the network for indoor scenes; with reductions of up to 26.7% and 24.0% in the median positional and rotational error respectively, when compared to the default PoseNet.
- Conference Article
8
- 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00085
- Dec 1, 2019
According to research on super-resolution (SR), SR image reconstruction using generated anti-networks can produce images that are more realistic than using convolutional neural networks. At present, SR technology based on convolutional neural networks ignores the impact of loss function on image reconstruction; the results lack detail and accuracy. In this paper, we use SR method and combine Generative Adversarial Networks to design a super-resolution (Lapras-GAN) model of the enhanced loss function. The proposed enhancement loss function is a Mix loss function that combines the multiscale SSIM and L1 loss functions to obtain realistic images. We performed qualitative and quantitative analysis of the performance of different loss functions and demonstrated the advantages of the Mix loss function. In addition, the neural network is accelerated by multiple GPUs of multiple nodes, which can be 3-4 times faster than a single node single GPU. Experimental results show that the proposed Lapras-GAN method can generate images consistent with images produced by human perception. Further comparisons show that our Lapras-GAN has excellent performance and test time in the PIRM2018 experimental test data set. Finally, we obtained a perception index of 1.83 and a test time of 0.031s in the PIRM2018 competition test set.
- Research Article
- 10.37934/araset.44.1.1221
- Apr 26, 2024
- Journal of Advanced Research in Applied Sciences and Engineering Technology
The work focuses on the analysis of the impact of loss functions on the effectiveness of a model for dehazing images. Dehazing, the process of removing haze or atmospheric scattering from images, plays a crucial role in various computer vision applications. To enhance the performance of dehazing models, it is essential to examine different loss functions and their variations. In this study, we employ a Generative Adversarial Network (GAN) as our model and evaluate the performance of various loss functions. The primary objective is to assess how well each loss function is capable of dehazing an image, while specifically investigating the influence of various structural similarity index (SSIM) loss variations on the dehazing effectiveness. Our experimental results reveal a notable discrepancy between qualitative and quantitative outcomes. Contradicting the traditional interpretation in literature, our qualitative analysis reveals that the SSIM IQA metric may not be a fully reliable indicator of dehazing effectiveness despite it being viewed to be correlated to human visual perception unlike Mean Square Error and Peak Signal to Noise Ratio Metrics. Moreover, we demonstrate that relying solely on quantitative results may lead to the selection of an inappropriate loss function. This finding emphasizes the significance of qualitative analysis in evaluating the performance of dehazing models. The disparity between quantitative and qualitative results emphasizes the need for newer image assessment metrics in the domain that can effectively bridge this gap. Such metrics should be able to be better correlated with human perception. This research contributes to the field of image dehazing by shedding light on the importance of qualitative analysis in addition to quantitative evaluation. By comprehensively analysing the impact of variations of SSIM loss functions and their combination with Mean Absolute Error (MAE) loss, we provide valuable insights into enhancing the effectiveness of dehazing models.
- Research Article
32
- 10.3390/bioengineering10040412
- Mar 26, 2023
- Bioengineering
Segmentation of the prostate gland from magnetic resonance images is rapidly becoming a standard of care in prostate cancer radiotherapy treatment planning. Automating this process has the potential to improve accuracy and efficiency. However, the performance and accuracy of deep learning models varies depending on the design and optimal tuning of the hyper-parameters. In this study, we examine the effect of loss functions on the performance of deep-learning-based prostate segmentation models. A U-Net model for prostate segmentation using T2-weighted images from a local dataset was trained and performance compared when using nine different loss functions, including: Binary Cross-Entropy (BCE), Intersection over Union (IoU), Dice, BCE and Dice (BCE + Dice), weighted BCE and Dice (W (BCE + Dice)), Focal, Tversky, Focal Tversky, and Surface loss functions. Model outputs were compared using several metrics on a five-fold cross-validation set. Ranking of model performance was found to be dependent on the metric used to measure performance, but in general, W (BCE + Dice) and Focal Tversky performed well for all metrics (whole gland Dice similarity coefficient (DSC): 0.71 and 0.74; 95HD: 6.66 and 7.42; Ravid 0.05 and 0.18, respectively) and Surface loss generally ranked lowest (DSC: 0.40; 95HD: 13.64; Ravid -0.09). When comparing the performance of the models for the mid-gland, apex, and base parts of the prostate gland, the models' performance was lower for the apex and base compared to the mid-gland. In conclusion, we have demonstrated that the performance of a deep learning model for prostate segmentation can be affected by choice of loss function. For prostate segmentation, it would appear that compound loss functions generally outperform singles loss functions such as Surface loss.
- Research Article
13
- 10.3390/s24041092
- Feb 7, 2024
- Sensors
The advancement of machine learning in industrial applications has necessitated the development of tailored solutions to address specific challenges, particularly in multi-class classification tasks. This study delves into the customization of loss functions within the eXtreme Gradient Boosting (XGBoost) algorithm, which is a critical step in enhancing the algorithm's performance for specific applications. Our research is motivated by the need for precision and efficiency in the industrial domain, where the implications of misclassification can be substantial. We focus on the drill-wear analysis of melamine-faced chipboard, a common material in furniture production, to demonstrate the impact of custom loss functions. The paper explores several variants of Weighted Softmax Loss Functions, including Edge Penalty and Adaptive Weighted Softmax Loss, to address the challenges of class imbalance and the heightened importance of accurately classifying edge classes. Our findings reveal that these custom loss functions significantly reduce critical errors in classification without compromising the overall accuracy of the model. This research not only contributes to the field of industrial machine learning by providing a nuanced approach to loss function customization but also underscores the importance of context-specific adaptations in machine learning algorithms. The results showcase the potential of tailored loss functions in balancing precision and efficiency, ensuring reliable and effective machine learning solutions in industrial settings.
- Conference Article
111
- 10.1109/cvpr.2019.01031
- Jun 1, 2019
Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.
- Research Article
- 10.1142/s012915642540631x
- Jun 13, 2025
- International Journal of High Speed Electronics and Systems
This study delves into the predictive capabilities of machine learning models for insurance claim severity estimation. By leveraging a dataset from French motor third-party liability insurance, two leading models, gradient boosting machine (GBM) and random forest (RF), are employed. Three distinct loss functions, RMSE, MAE, and gamma deviation, are utilized to assess their influence on model performance. Additionally, four novel calibration methods — additive, multiplicative, linear, and binning calibration — are explored to refine model predictions and better align them with actual claim amounts, thus innovating premium pricing strategies. Findings reveal that the loss function choice impacts prediction accuracy differently across models. The calibration methods generally enhance prediction accuracy, with the optimal one varying by scenario. The work underscores the criticality of selecting appropriate loss functions and calibration methods for optimizing insurance pricing prediction accuracy.
- Conference Article
6
- 10.1109/pcs56426.2022.10018061
- Dec 7, 2022
Nowadays, deep-learning image coding solutions have shown similar or better compression efficiency than conventional solutions based on hand-crafted transforms and spatial prediction techniques. These deep-learning codecs require a large training set of images and a training methodology to obtain a suitable model (set of parameters) for efficient compression. The training is performed with an optimization algorithm which provides a way to minimize the loss function. Therefore, the loss function plays a key role in the overall performance and includes a differentiable quality metric that attempts to mimic human perception. The main objective of this paper is to study the perceptual impact of several image quality metrics that can be used in the loss function of the training process, through a crowdsourcing subjective image quality assessment study. From this study, it is possible to conclude that the choice of the quality metric is critical for the perceptual performance of the deep-learning codec and that can vary depending on the image content.
- Research Article
20
- 10.3390/app11157046
- Jul 30, 2021
- Applied Sciences
Wildfires stand as one of the most relevant natural disasters worldwide, particularly more so due to the effect of climate change and its impact on various societal and environmental levels. In this regard, a significant amount of research has been done in order to address this issue, deploying a wide variety of technologies and following a multi-disciplinary approach. Notably, computer vision has played a fundamental role in this regard. It can be used to extract and combine information from several imaging modalities in regard to fire detection, characterization and wildfire spread forecasting. In recent years, there has been work pertaining to Deep Learning (DL)-based fire segmentation, showing very promising results. However, it is currently unclear whether the architecture of a model, its loss function, or the image type employed (visible, infrared, or fused) has the most impact on the fire segmentation results. In the present work, we evaluate different combinations of state-of-the-art (SOTA) DL architectures, loss functions, and types of images to identify the parameters most relevant to improve the segmentation results. We benchmark them to identify the top-performing ones and compare them to traditional fire segmentation techniques. Finally, we evaluate if the addition of attention modules on the best performing architecture can further improve the segmentation results. To the best of our knowledge, this is the first work that evaluates the impact of the architecture, loss function, and image type in the performance of DL-based wildfire segmentation models.
- Book Chapter
- 10.1007/978-3-030-80568-5_4
- Jan 1, 2021
Autoencoders have become increasingly popular in anomaly detection tasks over the years. Nevertheless, it remains a challenge to train autoencoders for anomaly detection tasks properly. A key contributing factor to this problem in many applications is the absence of a clean dataset from which the normal case can be learned. Instead, autoencoders must be trained based on a contaminated dataset containing an unknown amount of anomalies that potentially harm the training process. In this paper, we address this problem by studying the impact of the loss function on the robustness of an autoencoder. It is common practice to train an autoencoder by minimizing a loss function (e.g. squared error loss) under the assumption that all features are equally important to be reconstructed well. We relax this assumption and introduce a new loss function that adapts its robustness to anomalies based on the characteristics of data and on a per feature basis. Experimental results show that an autoencoder can be trained by this loss function robustly even when the training process is subject to many anomalies.KeywordsAnomaly detectionOutlier detectionNeural networksAutoencoderFeature reconstructionMachine learning
- Research Article
174
- 10.1016/j.cageo.2011.06.011
- Jul 2, 2011
- Computers & Geosciences
Support vector regression to predict porosity and permeability: Effect of sample size
- Research Article
- 10.56705/ijodas.v5i3.179
- Dec 31, 2024
- Indonesian Journal of Data and Science
This paper focuses on the Bayesian technique to estimate the parameters of the Weibull distribution. At this location, we use both informative and non-informative priors. We calculate the estimators and their posterior risks using different asymmetric and symmetric loss functions. Bayes estimators do not have a closed form under these loss functions. Therefore, we use an approximation approach established by Lindley to get the Bayes estimates. A comparative analysis is conducted to compare the suggested estimators using Monte Carlo simulation based on the related posterior risk. We also analyze the impact of distinct loss functions when using various priors.
- Conference Article
9
- 10.2514/6.2004-6205
- Jun 19, 2004
In aircraft design, the decisions made during the conceptual or preliminary design phases play a large role in determining the success of the design. Supporting decision makers in these early design phases require a decision making technique with the capability of managing multiple conflicting criteria and capturing the associated uncertainties. The Joint Probability Decision making technique, which incorporates a multi-criteria and a probabilistic approach to systems design, is such a technique. This technique uses Probability of Success as the objective function, which is obtained by integrating the Joint Probability Density Function of the criteria over the area of criterion values that are of interest to the customer. However, the calculation of probability of success does not take the deviation of the solutions from the target values into account, which may be often important for concept selection. Also, this technique employs weighting coefficients to indicate the importance of each criterion when calculating the probability of success. However, representing the decision maker's preference by using the numerical weights is considered ineffective and usually involves a largely undefined trial-and-error weight-tweaking process. The study presented in this paper was done with the intention of enhancing the joint probability decision making technique so as to make it useful for concept selection through the utilization of Loss Function. The impact of the loss function in the decision making process is investigated in this paper, and an advanced rotorcraft concept selection problem employing the joint probability decision making technique with and without the loss function is performed in order to demonstrate the improved technique.
- Research Article
9
- 10.2967/jnumed.123.266018
- Feb 15, 2024
- Journal of nuclear medicine : official publication, Society of Nuclear Medicine
Reliable performance of PET segmentation algorithms on clinically relevant tasks is required for their clinical translation. However, these algorithms are typically evaluated using figures of merit (FoMs) that are not explicitly designed to correlate with clinical task performance. Such FoMs include the Dice similarity coefficient (DSC), the Jaccard similarity coefficient (JSC), and the Hausdorff distance (HD). The objective of this study was to investigate whether evaluating PET segmentation algorithms using these task-agnostic FoMs yields interpretations consistent with evaluation on clinically relevant quantitative tasks. Methods: We conducted a retrospective study to assess the concordance in the evaluation of segmentation algorithms using the DSC, JSC, and HD and on the tasks of estimating the metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumors from PET images of patients with non-small cell lung cancer. The PET images were collected from the American College of Radiology Imaging Network 6668/Radiation Therapy Oncology Group 0235 multicenter clinical trial data. The study was conducted in 2 contexts: (1) evaluating conventional segmentation algorithms, namely those based on thresholding (SUVmax40% and SUVmax50%), boundary detection (Snakes), and stochastic modeling (Markov random field-Gaussian mixture model); (2) evaluating the impact of network depth and loss function on the performance of a state-of-the-art U-net-based segmentation algorithm. Results: Evaluation of conventional segmentation algorithms based on the DSC, JSC, and HD showed that SUVmax40% significantly outperformed SUVmax50%. However, SUVmax40% yielded lower accuracy on the tasks of estimating MTV and TLG, with a 51% and 54% increase, respectively, in the ensemble normalized bias. Similarly, the Markov random field-Gaussian mixture model significantly outperformed Snakes on the basis of the task-agnostic FoMs but yielded a 24% increased bias in estimated MTV. For the U-net-based algorithm, our evaluation showed that although the network depth did not significantly alter the DSC, JSC, and HD values, a deeper network yielded substantially higher accuracy in the estimated MTV and TLG, with a decreased bias of 91% and 87%, respectively. Additionally, whereas there was no significant difference in the DSC, JSC, and HD values for different loss functions, up to a 73% and 58% difference in the bias of the estimated MTV and TLG, respectively, existed. Conclusion: Evaluation of PET segmentation algorithms using task-agnostic FoMs could yield findings discordant with evaluation on clinically relevant quantitative tasks. This study emphasizes the need for objective task-based evaluation of image segmentation algorithms for quantitative PET.
- Research Article
47
- 10.3390/app11115029
- May 29, 2021
- Applied Sciences
In recent years, deep learning algorithms have been successfully applied in the development of decision support systems in various aspects of agriculture, such as yield estimation, crop diseases, weed detection, etc. Agriculture is the largest consumer of freshwater. Due to challenges such as lack of natural resources and climate change, an efficient decision support system for irrigation is crucial. Evapotranspiration and soil water content are the most critical factors in irrigation scheduling. In this paper, the ability of Long Short-Term Memory (LSTM) and Bidirectional LSTM (BLSTM) to model daily reference evapotranspiration and soil water content is investigated. The application of these techniques to predict these parameters was tested for three sites in Portugal. A single-layer BLSTM with 512 nodes was selected. Bayesian optimization was used to determine the hyperparameters, such as learning rate, decay, batch size, and dropout size.The model achieved the values of mean square error values within the range of 0.014 to 0.056 and R2 ranging from 0.96 to 0.98. A Convolutional Neural Network (CNN) model was added to the LSTM to investigate potential performance improvement. Performance dropped in all datasets due to the complexity of the model. The performance of the models was also compared with CNN, traditional machine learning algorithms Support Vector Regression, and Random Forest. LSTM achieved the best performance. Finally, the impact of the loss function on the performance of the proposed models was investigated. The model with the mean square error as loss function performed better than the model with other loss functions.