Published in last 50 years
Related Topics
Articles published on Unified Loss Function
- Research Article
- 10.1145/3749463
- Sep 3, 2025
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
- Lei Wang + 7 more
Speech enhancement can greatly improve the user experience during phone calls in low signal-to-noise ratio (SNR) scenarios. In this paper, we propose a low-cost, energy-efficient, and environment-independent speech enhancement system, namely AccCall, that improves phone call quality using the smartphone's built-in accelerometer. However, a significant gap remains between the underlying insight and its practical applications, as several critical challenges should be addressed, including efficiency of speech enhancement in cross-user scenario, adaptive system triggering to reduce energy consumption, and lightweight deployment for real-time processing. To this end, we first design Acc-Aided Network (AccNet), a cross-modal deep learning model inherently capable of cross-user generalization through three key components, including cross-modal fusion module, accelerometer-aided (abbreviated as acc-aided) mask generator, the unified loss function. Second, we adopt a machine learning-based approach instead of deep learning to achieve high accuracy in distinguishing call activity states followed by adaptive system triggering, ensuring lower energy consumption and efficient deployment on mobile platforms. Finally, we propose a knowledge-distillation-driven structured pruning framework that optimizes model efficiency while preserving performance. Extensive experiments with 20 participants have been conducted under a user-independent scenario. The results show that AccCall achieves excellent and reliable adaptive triggering performance, and enables substantial real-time improvements in SISDR, SISNR, STOI, PESQ, and WER, demonstrating the superiority of our system in enhancing speech quality and intelligibility for phone calls.
- Research Article
- 10.1016/j.neunet.2025.107992
- Aug 18, 2025
- Neural networks : the official journal of the International Neural Network Society
- Jae Hyun Yoon + 2 more
Unified auxiliary restoration network for robust multimodal 3D object detection in adverse conditions.
- Research Article
- 10.1029/2025jh000635
- Jun 1, 2025
- Journal of Geophysical Research: Machine Learning and Computation
- Zeqing Huang + 6 more
Abstract Machine learning methods provide a promising approach for exploiting relationships between raw forecasts and observations for forecast calibration. This paper highlights the role of data transformation in rainfall forecast calibration with neural networks. We develop a distributional regression network that accounts for the positive skewness and zero bound of rainfall by incorporating a normalizing transformation (log‐sinh) in both the input and output stages. A unified loss function is formulated based on the negative log‐likelihood function for parameter optimization. To test the role of data transformation, we conduct five calibration experiments: one that does not use transformation at all (the baseline) while the others use the log‐sinh transformation in different ways. All experiments are based on 10‐day rainfall forecasts from the European Centre for Medium‐range Weather Forecasts (ECMWF) from 2011 to 2022. Overall, the calibration methods effectively correct spatiotemporally varying biases in raw forecasts and improve reliability, yielding mean skill improvements of approximately 2%–11% and in the best case reducing forecast biases to less than 2%. Without transformation, the baseline method suffers from forecast biases ranging from −30% to 50%, due to its limited ability to characterize the uncertainty of rainfall forecasts. Of the four experiments that use the log‐sinh transformation, the optimal performance is achieved by the combined use of transforming raw forecasts for the input layer and utilizing fixed transformation parameters for generating calibrated forecasts in the output layer. We show that this method marginally outperforms an advanced existing Bayesian Ensemble Model Output Statistics method in reducing forecast biases.
- Research Article
1
- 10.1016/j.neunet.2025.107250
- Jun 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Youwei Wang + 3 more
Span-aware pre-trained network with deep information bottleneck for scientific entity relation extraction.
- Research Article
- 10.1609/aaai.v39i22.34504
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
- Mingyue Zhang + 7 more
Multi-agent path finding (MAPF) is a safety-critical scenario where the goal is to secure collision-free trajectories from initial to desired locations. However, due to system complexity and uncertainty, integrating learning-based controllers with MAPF is challenging and cannot theoretically guarantee the safety of the learned controllers. In response, our study proposes a verified safe multi-agent neural control (VSMANC) approach for MAPF, focusing on the unified training of Decentralized Control Barrier Functions (DCBF) and controllers to enhence safety. VSMANC enables all agents to concurrently learn controllers and DCBFs using a unified loss function designed to maximize safety, adhere to standard control policies, and incorporate path-finding-related heuristics. We also propose a formal verification-guided retraining process to both verify the properties of the learned DCBFs and generate counterexamples for retraining, thereby providing a verified safety guarantee. We validate our approach through shape formation experiments and UAV simulations, demonstrating significant improvements in safety and effectiveness in complex multi-agent environments.
- Research Article
- 10.3390/e27030283
- Mar 9, 2025
- Entropy (Basel, Switzerland)
- Huangji Wang + 1 more
Backdoor attacks remain a critical area of focus in machine learning research, with one prominent approach being the introduction of backdoor training injection mechanisms. These mechanisms embed backdoor triggers into the training process, enabling the model to recognize specific trigger inputs and produce predefined outputs post-training. In this paper, we identify a unifying pattern across existing backdoor injection methods in generative models and propose a novel backdoor training injection paradigm. This paradigm leverages a unified loss function design to facilitate backdoor injection across diverse generative models. We demonstrate the effectiveness and generalizability of this paradigm through experiments on generative adversarial networks (GANs) and Diffusion Models. Our experimental results on GANs confirm that the proposed method successfully embeds backdoor triggers, enhancing the model's security and robustness. This work provides a new perspective and methodological framework for backdoor injection in generative models, making a significant contribution toward improving the safety and reliability of these models.
- Research Article
1
- 10.1038/s41598-025-92054-0
- Mar 3, 2025
- Scientific Reports
- Tianqing Hu + 4 more
As an image enhancement technology, multi-modal image fusion primarily aims to retain salient information from multi-source image pairs in a single image, generating imaging information that contains complementary features and can facilitate downstream visual tasks. However, dual-stream methods with convolutional neural networks (CNNs) as backbone networks predominantly have limited receptive fields, whereas methods with Transformers are time-consuming, and both lack the exploration of cross-domain information. This study proposes an innovative image fusion model designed for multi-modal images, encompassing pairs of infrared and visible images and multi-source medical images. Our model leverages the strengths of both Transformers and CNNs to model various feature types effectively, addressing both short- and long-range learning as well as the extraction of low- and high-frequency features. First, our shared encoder is constructed based on Transformers for long-range learning, including an intra-modal feature extraction block, an inter-modal feature extraction block, and a novel feature alignment block that handles slight misalignments. Our private encoder for extracting low- and high-frequency features employs a dual-stream architecture based on CNNs, which includes a dual-domain selection mechanism and an invertible neural network. Second, we develop a cross-attention-based Swin Transformer block to explore cross-domain information. In particular, we introduce a weight transformation that is embedded into the Transformer block to enhance the efficiency. Third, a unified loss function incorporating a dynamic weighting factor is formulated to capture the inherent commonalities of multi-modal images. A comprehensive qualitative and quantitative analysis of image fusion and object detection experimental results demonstrates that the proposed method effectively preserves thermal targets and background texture details, surpassing state-of-the-art alternatives in terms of achieving high-quality image fusion and improving the performance in subsequent visual tasks.
- Research Article
1
- 10.1016/j.patrec.2024.11.006
- Nov 16, 2024
- Pattern Recognition Letters
- Shucheng Ji + 3 more
LED-Net: A lightweight edge detection network
- Research Article
2
- 10.1029/2024jh000169
- Jul 31, 2024
- Journal of Geophysical Research: Machine Learning and Computation
- Gong Cheng + 2 more
Abstract Predicting the future contribution of the ice sheets to sea level rise over the next decades presents several challenges due to a poor understanding of critical boundary conditions, such as basal sliding. Traditional numerical models often rely on data assimilation methods to infer spatially variable friction coefficients by solving an inverse problem, given an empirical friction law. However, these approaches are not versatile, as they sometimes demand extensive code development efforts when integrating new physics into the model. Furthermore, this approach makes it difficult to handle sparse data effectively. To tackle these challenges, we use the Physics‐Informed Neural Networks (PINNs) to seamlessly integrate observational data and governing equations of ice flow into a unified loss function, facilitating the solution of both forward and inverse problems within the same framework. We illustrate the versatility of this approach by applying the framework to two‐dimensional problems on the Helheim Glacier in southeast Greenland. By systematically concealing one variable (e.g., ice speed, ice thickness, etc.), we demonstrate the ability of PINNs to accurately reconstruct hidden information. Furthermore, we extend this application to address a challenging mixed inversion problem. We show how PINNs are capable of inferring the basal friction coefficient while simultaneously filling gaps in the sparsely observed ice thickness. This unified framework offers a promising avenue to enhance the predictive capabilities of ice sheet models, reducing uncertainties, and advancing our understanding of poorly constrained physical processes.
- Research Article
5
- 10.1007/s44267-024-00044-z
- Apr 7, 2024
- Visual Intelligence
- Yizhou Wang + 5 more
Accurate panoptic segmentation of 3D point clouds in outdoor scenes is critical for the success of applications such as autonomous driving and robot navigation. Existing methods in this area typically assume that the differences between instances are greater than the differences between points belonging to the same instance and use heuristic techniques for segmentation. However, this assumption may not hold in real scenes with occlusion and noise. In addition, most of the previous methods formulate point-wise embedding learning and instance clustering as two decoupled steps for separate optimization, making it a challenging task to learn discriminative embeddings. To address these issues, we introduce a framework for modeling points belonging to the same instance using learnable Gaussian distributions and formulate the point cloud as a Gaussian mixture model. Based on this formulation, we introduce a unified loss function that links the embedding learning and instance clustering in an end-to-end manner. Our framework is generic and can be seamlessly incorporated with existing panoptic segmentation networks. By explicitly modeling intra-instance variance and leveraging end-to-end optimization, our framework improves the discrimination capability of point embeddings with higher accuracy and robustness. Extensive experiments on two large-scale benchmarks demonstrate the effectiveness of the proposed method.
- Research Article
4
- 10.1609/aaai.v38i11.29120
- Mar 24, 2024
- Proceedings of the AAAI Conference on Artificial Intelligence
- Pingting Hao + 2 more
Multi-view multi-label feature selection aims to select informative features where the data are collected from multiple sources with multiple interdependent class labels. For fully exploiting multi-view information, most prior works mainly focus on the common part in the ideal circumstance. However, the inconsistent part hidden in each view, including noises and specific elements, may affect the quality of mapping between labels and feature representations. Meanwhile, ignoring the specific part might lead to a suboptimal result, as each label is supposed to possess specific characteristics of its own. To deal with the double problems in multi-view multi-label feature selection, we propose a unified loss function which is a totally splitting structure for observed labels as hybrid labels that is, common labels, view-to-all specific labels and noisy labels, and the view-to-all specific labels further splits into several specific labels of each view. The proposed method simultaneously considers the consistency and complementarity of different views. Through exploring the feature weights of hybrid labels, the mapping relationships between labels and features can be established sequentially based on their attributes. Additionally, the interrelatedness among hybrid labels is also investigated and injected into the function. Specific to the specific labels of each view, we construct the novel regularization paradigm incorporating logic operations. Finally, the convergence of the result is proved after applying the multiplicative update rules. Experiments on six datasets demonstrate the effectiveness and superiority of our method compared with the state-of-the-art methods.
- Research Article
2
- 10.1609/aaai.v38i17.29814
- Mar 24, 2024
- Proceedings of the AAAI Conference on Artificial Intelligence
- Jiangnan Li + 3 more
Generative methods tackle Multi-Label Classification (MLC) by autoregressively generating label sequences. These methods excel at modeling label correlations and have achieved outstanding performance. However, a key challenge is determining the order of labels, as empirical findings indicate the significant impact of different orders on model learning and inference. Previous works adopt static label-ordering methods, assigning a unified label order for all samples based on label frequencies or co-occurrences. Nonetheless, such static methods neglect the unique semantics of each sample. More critically, these methods can cause the model to rigidly memorize training order, resulting in missing labels during inference. In light of these limitations, this paper proposes a dynamic label-order learning approach that adaptively learns a label order for each sample. Specifically, our approach adopts a difficulty-prioritized principle and iteratively constructs the label sequence based on the sample s semantics. To reduce the additional cost incurred by label-order learning, we use the same SEQ2SEQ model for label-order learning and MLC learning and introduce a unified loss function for joint optimization. Extensive experiments on public datasets reveal that our approach greatly outperforms previous methods. We will release our code at https: //github.com/KagamiBaka/DLOL.
- Research Article
1
- 10.1007/s13246-024-01408-x
- Mar 21, 2024
- Physical and engineering sciences in medicine
- Fereshteh Yousefirizi + 11 more
Manual segmentation poses a time-consuming challenge for disease quantification, therapy evaluation, treatment planning, and outcome prediction. Convolutional neural networks (CNNs) hold promise in accurately identifying tumor locations and boundaries in PET scans. However, a major hurdle is the extensive amount of supervised and annotated data necessary for training. To overcome this limitation, this study explores semi-supervised approaches utilizing unlabeled data, specifically focusing on PET images of diffuse large B-cell lymphoma (DLBCL) and primary mediastinal large B-cell lymphoma (PMBCL) obtained from two centers. We considered 2-[18F]FDG PET images of 292 patients PMBCL (n = 104) and DLBCL (n = 188) (n = 232 for training and validation, and n = 60 for external testing). We harnessed classical wisdom embedded in traditional segmentation methods, such as the fuzzy clustering loss function (FCM), to tailor the training strategy for a 3D U-Net model, incorporating both supervised and unsupervised learning approaches. Various supervision levels were explored, including fully supervised methods with labeled FCM and unified focal/Dice loss, unsupervised methods with robust FCM (RFCM) and Mumford-Shah (MS) loss, and semi-supervised methods combining FCM with supervised Dice loss (MS + Dice) or labeled FCM (RFCM + FCM). The unified loss function yielded higher Dice scores (0.73 ± 0.11; 95% CI 0.67-0.8) than Dice loss (p value < 0.01). Among the semi-supervised approaches, RFCM + αFCM (α = 0.3) showed the best performance, with Dice score of 0.68 ± 0.10 (95% CI 0.45-0.77), outperforming MS + αDice for any supervision level (any α) (p < 0.01). Another semi-supervised approach with MS + αDice (α = 0.2) achieved Dice score of 0.59 ± 0.09 (95% CI 0.44-0.76) surpassing other supervision levels (p < 0.01). Given the time-consuming nature of manual delineations and the inconsistencies they may introduce, semi-supervised approaches hold promise for automating medical imaging segmentation workflows.
- Research Article
9
- 10.1109/tcyb.2023.3273535
- Mar 1, 2024
- IEEE Transactions on Cybernetics
- Xiaoqiang Yan + 4 more
Multitask image clustering approaches intend to improve the model accuracy on each task by exploring the relationships of multiple related image clustering tasks. However, most existing multitask clustering (MTC) approaches isolate the representation abstraction from the downstream clustering procedure, which makes the MTC models unable to perform unified optimization. In addition, the existing MTC relies on exploring the relevant information of multiple related tasks to discover their latent correlations while ignoring the irrelevant information between partially related tasks, which may also degrade the clustering performance. To tackle these issues, a multitask image clustering method named deep multitask information bottleneck (DMTIB) is devised, which aims at conducting multiple related image clustering by maximizing the relevant information of multiple tasks while minimizing the irrelevant information among them. Specifically, DMTIB consists of a main-net and multiple subnets to characterize the relationships across tasks and the correlations hidden in a single clustering task. Then, an information maximin discriminator is devised to maximize the mutual information (MI) measurement of positive samples and minimize the MI of negative ones, in which the positive and negative sample pairs are constructed by a high-confidence pseudo-graph. Finally, a unified loss function is devised for the optimization of task relatedness discovery and MTC simultaneously. Empirical comparisons on several benchmark datasets, NUS-WIDE, Pascal VOC, Caltech-256, CIFAR-100, and COCO, show that our DMTIB approach outperforms more than 20 single-task clustering and MTC approaches.
- Research Article
2
- 10.3389/fnbot.2023.1276208
- Sep 26, 2023
- Frontiers in Neurorobotics
- Di Zhang + 6 more
Human behavior recognition plays a crucial role in the field of smart education. It offers a nuanced understanding of teaching and learning dynamics by revealing the behaviors of both teachers and students. In this study, to address the exigencies of teaching behavior analysis in smart education, we first constructed a teaching behavior analysis dataset called EuClass. EuClass contains 13 types of teacher/student behavior categories and provides multi-view, multi-scale video data for the research and practical applications of teacher/student behavior recognition. We also provide a teaching behavior analysis network containing an attention-based network and an intra-class differential representation learning module. The attention mechanism uses a two-level attention module encompassing spatial and channel dimensions. The intra-class differential representation learning module utilized a unified loss function to reduce the distance between features. Experiments conducted on the EuClass dataset and a widely used action/gesture recognition dataset, IsoGD, demonstrate the effectiveness of our method in comparison to current state-of-the-art methods, with the recognition accuracy increased by 1-2% on average.
- Research Article
2
- 10.1016/j.inffus.2023.102013
- Sep 9, 2023
- Information Fusion
- Yingjie Tian + 2 more
Adaptive robust loss for landmark detection
- Research Article
15
- 10.1016/j.compmedimag.2023.102258
- Sep 1, 2023
- Computerized Medical Imaging and Graphics
- Shweta Tyagi + 2 more
An amalgamation of vision transformer with convolutional neural network for automatic lung tumor segmentation.
- Research Article
4
- 10.1155/2023/2430011
- Jul 22, 2023
- Structural Control and Health Monitoring
- Xiaoyou Wang + 3 more
Structural health monitoring (SHM) systems may suffer from multiple patterns of data anomalies. Anomaly detection is an essential preprocessing step prior to the use of monitoring data for structural condition assessment or other decision making. Deep learning techniques have been extensively used for automatic category classification by training the network with labelled data. However, because the SHM data are usually large in quantity, manually labelling these abnormal data is time consuming and labour intensive. This study develops a semisupervised learning-based data anomaly detection method using a small set of labelled data and massive unlabelled data. The MixMatch technique, which could mix labelled and unlabelled data using MixUp, is adopted to enhance the generalisation and robustness of the model. A unified loss function is defined to combine information from labelled and unlabelled data by incorporating consistency regularisation, entropy minimisation, and regular model regularisation items. In addition, customised data augmentation strategies for time series are investigated to further improve the model performance. The proposed method is applied to the SHM data from a real bridge for anomaly detection. Results demonstrate the superior performance of the developed method with very limited labelled data, greatly reducing the time and cost of labelling efforts compared with the traditional supervised learning methods.
- Research Article
3
- 10.1109/tmm.2021.3129056
- Jan 1, 2023
- IEEE Transactions on Multimedia
- Qinchuan Zhang + 6 more
We study the task of single person dense pose estimation. Specifically, given a human-centric image, we learn to map all human pixels onto a 3D, surface-based human body model. Existing methods approach this problem by fitting deep convolutional networks on sparse annotated points where the regression on both surface coordinate components for each body part is uncorrelated and optimized separately. In this work, we devise a novel, unified loss function that explicitly characterizes the correlation for surface coordinates regression, achieving significant improvements in both accuracy and efficiency. Furthermore, based on an observation that the image-to-surface correspondence is intrinsically invariant to geometric transformations from input images, we propose to enforce a geometric equivariance consistency on the target mapping, thereby allowing us to enable reliable supervision on large amounts of unlabeled pixels. We conduct comprehensive studies on the effectiveness of our approach using a quite simple network. Extensive experiments on the DensePose-COCO dataset show that our model achieves superior performance against previous state-of-the-art methods with much less computation complexity. We hope that our work would serve as a solid baseline for future study in the field. The code will be available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Johnqczhang/densepose.pytorch</uri> .
- Research Article
114
- 10.1016/j.inffus.2022.11.010
- Nov 14, 2022
- Information Fusion
- Chunyang Cheng + 2 more
MUFusion: A general unsupervised image fusion network based on memory unit