- New
- Research Article
- 10.1145/3773278
- Oct 28, 2025
- ACM Transactions on Computing for Healthcare
- Hasan Jamil
Large Language Models (LLMs) show promise for answering general health questions, but their utility for personalized health queries is limited by their lack of access to complete individual health records and general non-compliance with HIPAA requirements. Additionally, the opacity of LLMs and their tendency to generate hallucinated responses further reduce their reliability for personalized health question answering (QA). In contrast, Knowledge Graphs (KGs) have proven effective for QA tasks, especially in extracting structured insights from text, but transforming free text into KGs often leads to information or context loss that can compromise answer accuracy. To overcome this challenge, we present a novel iterative and monotonic KG refinement technique that enriches knowledge representation without sacrificing contextual integrity. We formalize this approach within a mathematical framework, demonstrating that each refinement step preserves or improves answer quality. Using electronic health record data, we validate the practical feasibility of this method and introduce PerHL , a personalized health QA system built on this refinement process. Our empirical results show that the approach significantly improves answer quality for personalized health queries, marking an important step toward trustworthy, context-aware health information systems.
- Research Article
- 10.1145/3761824
- Oct 14, 2025
- ACM Transactions on Computing for Healthcare
- Semyon Lomasov + 2 more
Aggregating person-level data across multiple clinical study sites is often constrained by privacy regulations, necessitating the development of decentralized modeling approaches in biomedical research. To address this requirement, a federated nonlinear regression algorithm based on the Choquet integral has been introduced for outcome prediction. This approach avoids reliance on prior statistical assumptions about data distribution and captures feature interactions, reflecting the non-additive nature of biomedical data characteristics. This work represents the first theoretical application of Choquet integral regression to multisite longitudinal trial data within a federated learning framework. The Multiple Imputation Choquet Integral Regression with LASSO (MIChoquet-LASSO) algorithm is specifically designed to reduce overfitting and enable variable selection in federated learning settings. Its performance has been evaluated using synthetic datasets, publicly available biomedical datasets, and proprietary longitudinal randomized controlled trial data. Comparative evaluations were conducted against benchmark methods, including ordinary least squares (OLS) regression and Choquet-OLS regression, under various scenarios such as model misspecification and both linear and nonlinear data structures in non-federated and federated contexts. Mean squared error was used as the primary performance metric. Results indicate that MIChoquet-LASSO outperforms compared models in handling nonlinear longitudinal data with missing values, particularly in scenarios prone to overfitting. In federated settings, Choquet-OLS underperforms, whereas the federated variant of the model, FEDMIChoquet-LASSO, demonstrates consistently better performance. These findings suggest that FEDMIChoquet-LASSO offers a reliable solution for outcome prediction in multisite longitudinal trials, addressing challenges such as missing values, nonlinear relationships, and privacy constraints while maintaining strong performance within the federated learning framework.
- Research Article
- 10.1145/3761822
- Oct 13, 2025
- ACM Transactions on Computing for Healthcare
- Vinod Kumar Chauhan + 7 more
While machine learning algorithms hold promise for personalised medicine, their clinical adoption remains limited, partly due to biases that can compromise the reliability of predictions. In this article, we focus on sample selection bias (SSB), a specific type of bias where the study population is less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. Moreover, the existing machine learning techniques try to correct the bias mostly by balancing distributions between the study and the target populations, which may result in a loss of predictive performance. To address these problems, our study illustrates the potential risks associated with SSB by examining SSB’s impact on the performance of machine learning algorithms. Most importantly, we propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction. Specifically, we propose two independent networks (T-Net) and a multitasking network (MT-Net) for addressing SSB, where one network/task identifies the target subpopulation which is representative of the study population and the second makes predictions for the identified subpopulation. Our empirical results with synthetic and semi-synthetic datasets highlight that SSB can lead to a large drop in the performance of an algorithm for the target population as compared with the study population, as well as a substantial difference in the performance for the target subpopulations that are representative of the selected and the non-selected patients from the study population. Furthermore, our proposed techniques demonstrate robustness across various settings, including different dataset sizes, event rates and selection rates, outperforming the existing bias correction techniques.
- Research Article
- 10.1145/3728368
- Oct 13, 2025
- ACM Transactions on Computing for Healthcare
- Seamus Ryan + 3 more
Machine learning-augmented applications have the potential to be powerful tools for decision-making in healthcare. However, healthcare is a complex domain that presents many challenges. These challenges, such as medical errors, clinician–patient relationships and treatment preferences, must be addressed to ensure fairness in ML-augmented healthcare applications. To better understand the influence these challenges have on fairness, 16 experienced engineers and designers with domain knowledge in healthcare technology were interviewed about how they would prioritise fairness in 3 healthcare scenarios (well-being improvement, chronic illness management, acute illness treatment). Using a template analysis, this work identifies the key considerations in the creation of fair ML for healthcare. These considerations clustered into categories related to technology, healthcare context and user perspectives. To explore these categories, we propose the stakeholder fairness conceptual model. This framework aids designers and developers in understanding the complex considerations that stem from the building, management and evaluation of ML-augmented healthcare applications, and how they affect the expectations of fairness. This work then discusses how this model may be applied when the health technology is directly provisioned to users, without a healthcare provider managing its use or adoption. This article contributes to the understanding of fairness requirements in healthcare, including the effect of healthcare errors, clinician-application collaboration and how the evaluation of healthcare technology becomes part of the fairness design process.
- Research Article
- 10.1145/3728369
- Oct 13, 2025
- ACM Transactions on Computing for Healthcare
- Etidal Alruwaili + 1 more
Machine learning is widely used across various fields, including e-health, to enhance efficiency, classify events, and make accurate predictions, such as diagnosing diseases and prescribing medications. However, machine learning models are increasingly vulnerable to data poisoning attacks, which manipulate training data to degrade model accuracy and cause incorrect predictions. This study focuses on detecting data poisoning in e-health applications by simulating label-flipping attacks at different rates (5%, 25%, 50%, 75%) on breast cancer and diabetes datasets. The performance of machine learning models in disease detection was evaluated before and after poisoning, alongside their ability to detect poisoned data. Results show that models perform significantly better on clean data, with a marked deterioration at higher poisoning rates (50%–75%). The Random Forest (RF) and Gradient Boosting (GB) models proved most effective in detecting poisoned data, particularly at higher rates of poisoning. Conversely, the Logistic Regression (LR) and Multi-layer Perceptron (MLP) models tended to overgeneralize, leading to false positives, especially in the breast cancer dataset. This study highlights the importance of safeguarding ML models in e-health from data poisoning threats.
- Research Article
- 10.1145/3757066
- Aug 4, 2025
- ACM Transactions on Computing for Healthcare
- Seungyeon Lee + 3 more
Sleep staging has become a critical task in diagnosing and treating sleep disorders to prevent sleep-related diseases. With growing large-scale sleep databases, significant progress has been made toward automatic sleep staging. However, previous studies face critical problems in sleep studies; the heterogeneity of subjects’ physiological signals, the inability to extract meaningful information from unlabeled data to improve predictive performances, the difficulty in modeling correlations between sleep stages, and the lack of an effective mechanism to quantify predictive uncertainty. In this study, we propose a neural network-based sleep staging model, DREAM, to learn domain generalized representations from physiological signals and model sleep dynamics. DREAM learns sleep-related and subject-invariant representations from diverse subjects’ sleep signals and models sleep dynamics by capturing interactions between sequential signal segments and between sleep stages. We conducted a comprehensive empirical study to demonstrate the superiority of DREAM, including sleep stage prediction experiments, a case study, the usage of unlabeled data, and uncertainty. Notably, the case study validates DREAM's ability to learn the generalized decision function for new subjects, especially in case there are differences between testing and training subjects. Uncertainty quantification shows that DREAM provides prediction uncertainty, making the model reliable and helping sleep experts in real-world applications.
- Research Article
- 10.1145/3757931
- Aug 1, 2025
- ACM Transactions on Computing for Healthcare
- Ghazal Bargshady + 5 more
This study investigates the effectiveness of various machine learning and deep learning models for automated pain detection using functional near-infrared spectroscopy (fNIRS) data from the AI4Pain Grand Challenge dataset. Four different near-infrared spectroscopy metrics – oxygenated haemoglobin (HbO2), deoxygenated haemoglobin (HHb), total haemoglobin (HT), and haemoglobin difference (HbDiff) – were investigated to determine their contributions to pain assessment and identify which metric offers the most reliable performance. Across all models, both traditional and deep learning, HbDiff consistently outperformed the other metrics in terms of classification accuracy. The multi-kernel fully convolutional network hybrid with long short-term memory (MK-FCN-LSTM) model, particularly when utilising the HbDiff metric, achieved superior performance with a binary classification accuracy of 64.73%. These findings suggest that haemoglobin difference may provide more sensitive and reliable features for pain assessment, highlighting its potential as a key biomarker in fNIRS-based pain detection systems.
- Research Article
- 10.1145/3757067
- Jul 31, 2025
- ACM Transactions on Computing for Healthcare
- Alessandro Cacciatore + 5 more
The current approach to deep-learning research is exemplified by the pursuit of Red AI models—designs that show increasingly higher performance but with increasingly higher costs, in terms of economical requirements and environmental footprint. This approach is particularly detrimental in sectors like healthcare, which typically have limited resources. Meanwhile, Green AI prioritizes efficiency and sustainability, by reducing the environmental footprint and making advanced technologies accessible. Following the Green AI principles, this study focuses on the combined use of two techniques, namely Knowledge Distillation (KD) and Deep Supervision (DS), to reduce the costs of HRNet, a convolutional neural network designed for human pose estimation, here applied to support the diagnosis of neurological impairments in preterm infants. All the experiments are carried out on the BabyPose dataset, a collection of videos from a depth camera showing hospitalized preterm infants. By combining KD and DS, we can use a sub-network of HRNet, which needs 27.5% of parameters and 61.7% of FLOPs in HRNet, without affecting performance (-0.59 percentage points in average precision). This achievement can have deep implications in the actual clinical practice, as it fosters democratization of high-quality technologies. Our codes are available at https://github.com/geronimaw/OnlineKD-HRNet-Human-Pose-Estimation.git .
- Research Article
- 10.1145/3726875
- Jul 8, 2025
- ACM Transactions on Computing for Healthcare
- Andrea Wrona + 2 more
Artificial Intelligence and Machine Learning have brought transformative changes to clinical diagnostics, especially in image classification via deep convolutional neural networks. The latter are crucial in analyzing and identifying diseases from medical visuals, ensuring accurate diagnosis and prevention. However, the limited availability of medical imaging data poses a serious challenge and leads to poor performance of the classifier. To address this issue, both geometric/color augmentation and synthetic data generation (through Generative Adversarial Networks and Variational Autoencoders) methods are employed, both operating on the entire dataset without an intrinsic optimization of the classification process. This study introduces an automated data augmentation strategy in which the editing operation is customized with respect to the individual images contained in the training set. This is done by using the Deep Reinforcement Learning framework provided by the Proximal Policy Optimization algorithm, with the reward being the test accuracy of the image classifier. The application of the proposed procedure on a meager dataset related to gastrointestinal diseases demonstrates an improvement in image classification by over 3%.
- Research Article
- 10.1145/3745789
- Jun 25, 2025
- ACM Transactions on Computing for Healthcare
- Joel Abraham + 3 more
Precision medicine, which aims to optimize medical care at the individual level, remains a significant challenge and aspiration in oncology. The pathway to a successful implementation requires methods that can work with a vast heterogeneity of cancer, and consider the interplay of environmental, societal, biological, and clinical factors. To support decision-making in this context, computational frameworks must integrate large-scale, diverse, and noisy data, discover fine-grained patient subgroups with shared underlying characteristics, and characterize the imperfect preclinical spaces where novel therapies are tested. We propose an integrated digital-twin framework in which machine learning and semantic models collaboratively represent and reason with diverse patient data and medical domain knowledge to generate treatment recommendations. Clinical and molecular characteristics are used to discover subtypes of brain cancers, which are represented as ontologies with associated rules to determine a patient’s membership in a given subtype. Similarly, preclinical models used for therapeutic testing are characterized and assessed for their similarity to patient cancer models. By semantically discovering links between these preclinical models and patient cancer subtypes, novel therapeutics tested on preclinical models can be prioritized and hypothesized for individual patients. This approach, which requires empirical testing, demonstrates how cross-domain reasoning can be used to propose individualized treatment plans.