Explainable artificial intelligence and ensemble learning for hepatocellular carcinoma classification: State of the art, performance, and clinical implications
Hepatocellular carcinoma (HCC) remains a leading cause of cancer-related mortality globally, necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy. This review synthesizes evidence on explainable ensemble learning approaches for HCC classification, emphasizing their integration with clinical workflows and multi-omics data. A systematic analysis [including datasets such as The Cancer Genome Atlas, Gene Expression Omnibus, and the Surveillance, Epidemiology, and End Results (SEER) datasets] revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features, serum biomarkers such as alpha-fetoprotein, imaging features such as computed tomography and magnetic resonance imaging, and genomic data. For instance, SHapley Additive exPlanations (SHAP)-based random forests trained on NCBI GSE14520 microarray data (n = 445) achieved 96.53% accuracy, while stacking ensembles applied to the SEER program data (n = 1897) demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction. Despite promising results, challenges persist, including the computational costs of SHAP and local interpretable model-agnostic explanations analyses (e.g., TreeSHAP requiring distributed computing for metabolomics datasets) and dataset biases (e.g., SEER’s Western population dominance limiting generalizability). Future research must address inter-cohort heterogeneity, standardize explainability metrics, and prioritize lightweight surrogate models for resource-limited settings. This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability, though rigorous validation in independent, multi-center cohorts is critical for real-world deployment.
- Research Article
- 10.1158/1055-9965.disp-11-pr1
- Sep 1, 2011
- Cancer Epidemiology, Biomarkers & Prevention
Introduction: Hepatocellular carcinoma (HCC) incidence is increasing in the U.S. for unknown reasons despite a decline in cancer overall during 1975–2006. Latinos have higher rates of HCC than other groups, and attributable risks for HCC among Latinos have been identified. This study compared HCC incidence and behavioral risk factors associated with it from 1995 through 2006 between U.S. Latinos, Texas Latinos and a South Texas Latino subset. We hypothesized that HCC incidence is higher among South Texas Latinos in conjunction with higher attributable behavioral risk factors during the same period. Methods: Data from the U.S. SEER (Surveillance, Epidemiology, and End Results) Program, Texas Cancer Registry, and the Texas Department of State Health Services (TDSHS) were obtained. Annual age-specific and age-adjusted HCC incidence rates, annual percent changes (APCs) and 95% confidence intervals (CI) were calculated as well as prevalences of obesity, diabetes, heavy alcohol use and smoking. Analyses were performed using SEER*Stat and SPSS Complex Samples software. Groups were compared using Chi-Squared and T-Tests with differences assessed at p < .05 if confidence intervals did not overlap. Results: Latinos accounted for more than a third of HCC in Texas and nearly three-fourths of all HCC in South Texas, significantly greater proportions than SEER. More than 70% of HCC in Latinos occurred in men, with similar percentages observed among SEER, Texas and South Texas groups. HCC in Latinos was highest in South Texas (10.6/100,000) and Texas (9.7/100,000) compared to SEER (7.5/100,000). South Texas Latinos were older than their SEER counterparts (Median = 67 vs. 62). More South Texas and Texas Latinos than SEER resided in rural areas (14.8%, 14.3% vs. 5.2%). Prevalence percentages of HCC-related behavioral risk factors for Latinos in the U.S., Texas and South Texas for two time periods, 1995–1997 and 2004–2006 show that obesity increased among all three groups of Latinos overall from the first to the second time period. Additionally, Texas and South Texas Latinos had higher obesity prevalence than U.S. Latinos during the most recent period (30.2% and 35.0% versus 26.7%). Moreover diabetes prevalence increased among U.S. Latinos. Texas and South Texas Latinos also showed an increasing pattern, although confidence intervals overlapped. For 2004–2006, the prevalence of diabetes was higher in South Texas Latinas than U.S. Latinas (10.3% and 7.8%, respectively). Heavy alcohol and cigarette use did not change significantly over time among any Latino group. Conclusion: Our findings support observations that HCC is alarmingly on the rise in the United States. We have described an important constellation of risks for HCC in this group that may result in higher rates of the disease among Latinos. Most if not all of these risks are modifiable, preventable or treatable. Clearly there is a need to focus on HCC etiological research and intervention that takes into account not only the most significant attributable risks for the disease, but also genetic, cultural and socioeconomic predisposing features. The potential contribution of these to HCC indicates a need for etiologic research to firmly establish associations and inform HCC-related prevention interventions. Acknowledgements: This research was possible by grants from the San Antonio Cancer Institute, San Antonio, Texas (P30-CA54174) and the National Cancer Institute, Redes En Acción (U01-CA86117). Citation Information: Cancer Epidemiol Biomarkers Prev 2011;20(10 Suppl):PR1.
- Research Article
- 10.1002/dad2.70162
- Jul 1, 2025
- Alzheimer's & Dementia : Diagnosis, Assessment & Disease Monitoring
INTRODUCTIONAlzheimer's disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Early diagnosis is vital. We developed an interpretable machine learning (ML) model for early AD prediction using open clinical data.METHODSData from 2149 adults (60–90 years) were obtained from Kaggle. After preprocessing and feature engineering, tree‐based models were trained. A stacking ensemble model combining Gradient Boosting and XGBoost was trained, with Logistic Regression as the meta‐learner. SHapley Additive exPlanations (SHAP) provided interpretability. Performance was measured by accuracy, precision, recall, F1 score, ROC and AUC.RESULTSThe stacked ensemble achieved 97% accuracy (AUC 0.97), with 0.97 precision, 0.94 recall, and 0.96 F1 score for AD. SHAP identified memory complaints, Mini‐Mental State Examination (MMSE), functional assessment, behavioral symptoms, cholesterol, and lifestyle factors (activity, diet, sleep) as top predictors.CONCLUSIONThe ensemble model, enhanced by SHAP analysis, provides accurate and interpretable AD risk predictions with potential applicability in future clinical decision support systems.HighlightsDeveloped an ensemble machine learning (ML) model for early Alzheimer's disease (AD) prediction.Achieved 97% accuracy using stacked XGBoost and Gradient Boosting.SHapley Additive exPlanations (SHAP) analysis identified key cognitive and lifestyle‐related risk factors.Model interprets AD risk using explainable artificial intelligence (AI) for clinical applicability.Utilized open‐access dataset to ensure reproducibility and transparency.
- Research Article
19
- 10.1159/000491534
- Jan 1, 2018
- Cellular Physiology and Biochemistry
Background/Aims: Hepatocellular carcinoma (HCC) remains a difficult problem that significantly affects the survival of the afflicted patients. Accumulating evidence has demonstrated the functions of long non-coding RNA (lncRNA) in HCC. In the present study, we aimed to explore the potential roles of PVT1 in the tumorigenesis and progression of HCC. Methods: In this study, quantitative reverse transcription-polymerase chain reaction (RT-qPCR) was applied to detect the differences between PVT1 expression in HCC tissues and cell lines. Then, the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases were searched to confirm the relationship between PVT1 expression and HCC. Moreover, a meta-analysis comprising TCGA, GEO, and RT-qPCR was applied to estimate the expression of PVT1 in HCC. Then, cell proliferation was evaluated in vitro. A chicken chorioallantoic membrane (CAM) model of HCC was constructed to measure the effect on tumorigenicity in vivo. To further explore the sponge microRNA (miRNA) of PVT1 in HCC, we used TCGA, GEO, a gene microarray, and target prediction algorithms. TCGA and GEO and the gene microarray were used to select the differentially expressed miRNAs, and the different target prediction algorithms were applied to predict the target miRNAs of PVT1. Results: We found that PVT1 was markedly overexpressed in HCC tissue than in normal liver tissues based on both RT-qPCR and data from TCGA, and the overexpression of PVT1 was closely related to the gender and race of the patient as well as to higher HCC tumor grades. Also, a meta-analysis of 840 cases from multiple sources (TCGA, GEO and the results of our in-house RT-qPCR) showed that PVT1 gained moderate value in discriminating HCC patients from normal controls, confirming the results of RT-qPCR. Additionally, the upregulation of PVT1 could promote HCC cell proliferation in vitro and vivo. Based on the competing endogenous RNA (ceRNA) theory, the PVT1/miR-424-5p/INCENP axis was finally selected for further research. The in silico prediction revealed that there were complementary sequences between PVT1 and miR-424-5p as well as between miR-424-5p and INCENP. Furthermore, a negative correlation trend was found between miR-424-5p and PVT1 based on RT-qPCR, whereas a positive correlation trend was found between PVT1 and INCENP based on data from TCGA. Also, INCENP small interfering RNA (siRNA) could significantly inhibit cell proliferation and viability. Conclusions: We hypothesized that PVT1 could affect the biological function of HCC cells via targeting miR-424-5p and regulating INCENP. Focusing on the new insight of the PVT1/miR-424-5p/INCENP axis, this study provides a novel perspective for HCC therapeutic strategies.
- Research Article
3
- 10.1002/jgm.3732
- Aug 26, 2024
- The journal of gene medicine
This study aims to develop and validate machine learning-based diagnostic and prognostic models to predict the risk of distant lymph node metastases (DLNM) in patients with hepatocellular carcinoma (HCC) and to evaluate the prognosis for this cohort. Utilizing a retrospective design, this investigation leverages data extracted from the Surveillance, Epidemiology, and End Results (SEER) database, specifically the January 2024 subset, to conduct the analysis. The study cohort consists of 15,775 patients diagnosed with HCC as identified within the SEER database, spanning 2016 to 2020. In the construction of the diagnostic model, recursive feature elimination (RFE) is employed for variable selection, incorporating five critical predictors: age, tumor size, radiation therapy, T-stage, and serum alpha-fetoprotein (AFP) levels. These variables are the foundation for a stacking ensemble model, which is further elucidated through Shapley Additive Explanations (SHAP). Conversely, the prognostic model is crafted utilizing stepwise backward regression to select pertinent variables, including chemotherapy, radiation therapy, tumor size, and age. This model culminates in the development of a prognostic nomogram, underpinned by the Cox proportional hazards model. The outcome of the diagnostic model is the occurrence of DLNM in patients. The outcome of the prognosis model is determined by survival time and survival status. The integrated model developed based on stacking demonstrates good predictive performance and high interpretative variability and differentiation. The area under the curve (AUC) in the training set is 0.767, while the AUC in the validation set is 0.768. The nomogram, constructed using the Cox model, also demonstrates consistent and strong predictive capabilities. At the same time, we recognized elements that have a substantial impact on DLNM and the prognosis and extensively discussed their significance in the model and clinical practice. Our study identified key predictive factors for DLNM and elucidated significant prognostic indicators for HCC patients with DLNM. These findings provide clinicians with valuable tools to accurately identify high-risk individuals for DLNM and conduct more precise risk stratification for this patient subgroup, potentially improving management strategies and patient outcomes.
- Research Article
- 10.30574/wjaets.2025.15.2.0635
- May 30, 2025
- World Journal of Advanced Engineering Technology and Sciences
The rapid advancements in artificial intelligence and machine learning have led to the development of highly sophisticated models capable of superhuman performance in a variety of tasks. However, the increasing complexity of these models has also resulted in them becoming "black boxes", where the internal decision-making process is opaque and difficult to interpret. This lack of transparency and explainability has become a significant barrier to the widespread adoption of these models, particularly in sensitive domains such as healthcare and finance. To address this challenge, the field of Explainable AI has emerged, focusing on developing new methods and techniques to improve the interpretability and explainability of machine learning models. This review paper aims to provide a comprehensive overview of the research exploring the combination of Explainable AI and traditional machine learning approaches, known as "hybrid models". This paper discusses the importance of explainability in AI, and the necessity of combining interpretable machine learning models with black-box models to achieve the desired trade-off between accuracy and interpretability. It provides an overview of key methods and applications, integration techniques, implementation frameworks, evaluation metrics, and recent developments in the field of hybrid AI models. The paper also delves into the challenges and limitations in implementing hybrid explainable AI systems, as well as the future trends in the integration of explainable AI and traditional machine learning. Altogether, this paper will serve as a valuable reference for researchers and practitioners working on developing explainable and interpretable AI systems. Keywords: Explainable AI (XAI), Traditional Machine Learning (ML), Hybrid Models, Interpretability, Transparency, Predictive Accuracy, Neural Networks, Ensemble Methods, Decision Trees, Linear Regression, SHAP (Shapley Additive Explanations), LIME (Local Interpretable Model-agnostic Explanations), Healthcare Analytics, Financial Risk Management, Autonomous Systems, Predictive Maintenance, Quality Control, Integration Techniques, Evaluation Metrics, Regulatory Compliance, Ethical Considerations, User Trust, Data Quality, Model Complexity, Future Trends, Emerging Technologies, Attention Mechanisms, Transformer Models, Reinforcement Learning, Data Visualization, Interactive Interfaces, Modular Architectures, Ensemble Learning, Post-Hoc Explainability, Intrinsic Explainability, Combined Models
- Research Article
- 10.1158/1538-7445.am2012-3595
- Apr 15, 2012
- Cancer Research
Introduction. Hepatocellular carcinoma (HCC) has increased in the U.S. from 1975-2006 while overall cancer has declined. Moreover, HCC among South Texas Latinos is higher than other U.S. Latinos. In recent years a number of risk factors have been associated with HCC including hepatitis-C virus infection, heavy alcohol use, obesity, diabetes and others. Of these, only diabetes exhibits characteristics which make it a candidate for why HCC is increasing in the U.S. Latino population and increasingly higher among South Texas Latinos. This study compares incidence rates of HCC and prevalence rates of diabetes among U.S. Latinos and South Texas Latinos. We hypothesize that these data implicate diabetes in HCC and suggest a clear path to informed HCC prevention. Methods. Data from the U.S. SEER (Surveillance, Epidemiology, and End Results) Program, Texas Cancer Registry, and the Texas Department of State Health Services (TDSHS) were obtained. Age-adjusted HCC incidence rates were calculated using SEER*Stat and SPSS software and aggregated over 3-year periods. Likewise, annual prevalence of diabetes was calculated and aggregated over the same periods. For each measure values for the mutually exclusive U.S. and South Texas populations were compared. Trend slopes for both HCC and diabetes were calculated and displayed graphically. Group differences were assessed at p < .05 if confidence intervals did not overlap. Results. U.S. (SEER) Latino HCC incidence averaged 6.1/100,000 during 1995-1997 and increased to 8.0 during 2004-2006 (slope m = 0.455). South Texas Latino rates averaged 9.2 and 11.7 during the same periods (slope m = 0.625) (incidence rate difference, p < .05). Simultaneously, the prevalence rate of diabetes was 5.9% and 7.7% among U.S. SEER Latinos (slope m = 0.450) and 7.6% and 9.6% among South Texas Latinos (slope m = 0.500) (prevalence rate difference, p < .05). Although slope differences could not be calculated, they indicate a greater rate of increase among South Texas Latinos than U.S. SEER Latinos. Conclusion. Our findings support observations that HCC and diabetes are increasing in the United States. We have described an important relationship between increasing rates of HCC and diabetes in the U.S. SEER and South Texas Latino populations. We suggest this relationship may explain higher rates of HCC among Latinos. There is a need to focus HCC etiological research to account for this relationship while simultaneously on other attributable risks for the disease as well as genetic, cultural and socioeconomic predisposing features. We note that diabetes is preventable or treatable. The potential contribution of this research can firmly establish associations and inform HCC prevention. Acknowledgements. The San Antonio Cancer Institute, San Antonio, Texas (P30-CA54174) and the National Cancer Institute, Redes En Acción (U01-CA86117). Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 3595. doi:1538-7445.AM2012-3595
- Research Article
25
- 10.3892/mmr.2017.8040
- Nov 14, 2017
- Molecular medicine reports
Increasing evidence has demonstrated that microRNA (miR)-133a-3p is an important regulator of hepatocellular carcinoma (HCC). In the present study, the diagnostic role of miR-133a-3p in HCC, and the potential functional pathways, were both explored based on publicly available data. Eligible microarray datasets were collected from NCBI Gene Expression Omnibus (GEO) database and ArrayExpress database. The data related to HCC and matched adjacent normal tissues were also downloaded from The Cancer Genome Atlas (TCGA). Published studies reporting the association between miR-133a-3p expression and HCC were reviewed from multiple databases. By combining the data derived from three sources (GEO, TCGA and published studies), the authors analyzed the comprehensive relationship between miR-133a-3p expression and clinicopathological features of HCC. Eventually, putative targets of miR-133a-3p in HCC were selected for further bioinformatics prediction. A total of eight published microarray datasets were gathered, and the pooled results demonstrated that the expression of miR-133a-3p in the tumor group was lower than that in normal groups [standardized mean difference (SMD)=−0.54; 95% confidence interval (CI), −0.74 to −0.35; P<0.001]. Consistently, the level of miR-133a-1 in HCC was reduced markedly compared to normal tissues (P<0.001) based on TCGA data, and the AUC value of low miR-133a-1 expression for HCC diagnosis was 0.670 (P<0.001). Furthermore, the combined SMD of all datasets (GEO, TCGA and literature) suggested that significant difference was observed between the HCC group and the normal control group, and lower miR-133a-3p expression in HCC group was noted (SMD=−0.69; 95% CI, −1.10 to −0.29; P=0.001). In addition, the authors discovered five key genes of the calcium signaling pathway (NOS1, ADRA1A, ADRA1B, ADRA1D and TBXA2R) that may probably be targeted by miR-133a-3p in HCC. The study reveals that miR-133a-3p may function as a tumor suppressor in HCC. The prospective novel pathways and key genes of miR-133a-3p could offer potential biomarkers for HCC; however, the predictions require further confirmation.
- Research Article
- 10.1007/s12672-025-03353-x
- Aug 6, 2025
- Discover Oncology
BackgroundHepatocellular carcinoma with pulmonary metastasis (HCC-PM) is a common complication of hepatocellular carcinoma (HCC) and has gained increasing attention. However, there is currently no effective model for predicting the risk of HCC-PM in patients with HCC. This study aims to develop a precise predictive model to assess the risk of HCC-PM in patients with HCC.MethodsWe retrospectively analyzed HCC cases from the Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2018. To address data imbalance, we applied the synthetic minority oversampling technique (SMOTE). Feature selection was conducted using the Boruta algorithm and multivariate logistic regression. Eight machine learning models were then developed and evaluated using validation cohorts for predictive performance. Feature importance was further analyzed using Shapley Additive Explanations (SHAP).ResultsThis study included 20,346 patients diagnosed with HCC. Age, race, T stage, N stage, bone metastasis, brain metastasis, radiation, chemotherapy, and surgery were identified as independent risk factors for patients with HCC-PM. Among the machine learning models, RF achieved the best performance, with an AUC of 0.894 in the training cohort and 0.830 in the validation cohort. Based on the RF algorithm, we developed a user-friendly, web-based tool to estimate the risk of pulmonary metastasis in patients with HCC.ConclusionThe ML model developed in this study accurately predicts the risk of pulmonary metastasis in patients with HCC by analyzing individual clinical parameters, TNM stage, and treatment information. By enabling precise risk stratification, this model may support clinical risk assessment and facilitate more personalized management of HCC.Supplementary InformationThe online version contains supplementary material available at 10.1007/s12672-025-03353-x.
- Research Article
12
- 10.3892/mmr.2017.8167
- Nov 27, 2017
- Molecular Medicine Reports
The aims of the present study were to examine the potential role of microRNA-233-3p (miR)-223-3p in the tumorigenesis of hepatocellular carcinoma (HCC), and to investigate its diagnostic accuracy and potential molecular mechanisms. The expression data of miR-223-3p in HCC were obtained from the Gene Expression Omnibus (GEO). Data for the precursor miR-223 were obtained from The Cancer Genome Atlas (TCGA). The diagnostic role of miR-223-3p was identified by the receiver operating curve (ROC), and the diagnostic value of miR-223-3p in HCC was calculated from qualified reports in the literature. In addition, associated data from the GEO, TCGA and qualified experiments were pooled for comprehensive meta-analysis. Genes, which intersected between online prediction databases, natural language processing and differentially expressed genes from TCGA were regarded as potential targets of miR-223-3p in HCC. The Gene Ontology enrichment analysis and the Kyoto Encyclopedia of Genes and Genomes pathways of potential targets were performed using the Database for Annotation, Visualization and Integrated Discovery. The protein-protein interactions were mapped using the Search Tool for the Retrieval of Interacting Genes. Among 15 qualified microarray data sets from GEO, seven showed that a significantly lower level of miR-223-3p was present in the HCC tissues, compared with that in non-cancerous tissues (P<0.05). In addition, five GEO data sets revealed diagnostic values of miR-223-3p, with an area under the curve (AUC) of >0.80 (P<0.05). The diagnostic accuracy of the precursor miR-223 in TCGA was also calculated (AUC=0.78, P<0.05). Similarly, the precursor miR-223 showed a higher level of downregulation in HCC tissues, compared with that in healthy controls in TCGA (P<0.001). A summary ROC was also calculated as 0.89 (95% CI, 0.85–0.91) in the meta-analysis. A total of 72 potential targets were extracted, mainly involved in the terms ‘microRNAs in cancer’, ‘ATP binding’ and ‘prostate cancer’. Five potential target genes were considered the hub genes of miR-223-3p in HCC, including checkpoint kinase 1, DNA methyltransferase 1, baculoviral IAP repeat containing 5, kinesin family member 23, and collagen, type I, α1. Based on TCGA, the hub genes were significantly upregulated in HCC (P<0.05). Collectively, these results showed that miR-223-3p may be crucial in HCC carcinogenesis showing high diagnostic accuracy, and may be mediated by several hub genes.
- Research Article
11
- 10.1002/cam4.5475
- Dec 7, 2022
- Cancer Medicine
PurposeThis study aimed to compare the prognostic value of multiple lymph node metastasis (LNM) indicators and to develop optimal prognostic nomograms for bladder cancer (BC) patients.MethodsBC patients were obtained from the Surveillance, Epidemiology, and End Results (SEER) database between 2004 and 2015, and randomly partitioned into training and internal validation cohorts. Genomic and clinical data were collected from The Cancer Genome Atlas (TCGA) as external validation cohort. The predictive efficiency of LNM indicators was compared by constructing multivariate Cox regression models. We constructed nomograms on basis of the optimal models selected for overall survival (OS) and cause‐specific survival (CSS). The performance of nomograms was evaluated with calibration plot, time‐dependent area under the curve (AUC) and decision curve analysis (DCA) in three cohorts. We subsequently estimated the difference of biological function and tumor immunity between two risk groups stratified by nomograms in TCGA cohort.ResultsTotally, 10,093 and 107 BC patients were screened from the SEER and TCGA databases. N classification, positive lymph nodes (PLNs), lymph node ratio (LNR) and log odds of positive lymph nodes (LODDS) were all independent predictors for OS and CSS. The filtered models containing LODDS had minimal Akaike Information Criterion, maximal concordance indexes and AUCs. Age, LODDS, T and M classification were integrated into nomogram for OS, while nomogram for CSS included gender, tumor grade, LODDS, T and M classification. The nomograms were successfully validated in predictive accuracy and clinical utility in three cohorts. Additionally, the tumor microenvironment was different between two risk groups.ConclusionsLODDS demonstrated superior prognostic performance over N classification, PLN and LNR for OS and CSS of BC patients. The nomograms incorporating LODDS provided appropriate prediction of BC, which could contribute to the tumor assessment and clinical decision‐making.
- Research Article
2
- 10.1186/s12882-025-04128-w
- Apr 22, 2025
- BMC Nephrology
BackgroundHospital readmission following renal transplantation significantly impacts patient outcomes and healthcare resources. While machine learning approaches offer promising solutions for risk prediction, their clinical application often lacks interpretability. We developed an explainable artificial intelligence (XAI) based supervised learning model to predict 30-day hospital readmission risk following renal transplantation.MethodsWe conducted a retrospective analysis of 588 renal transplant recipients at King Abdullah International Medical Research Center, with a predominance of living donor transplants (85.2%, n = 500). Our methodology included a four-stage machine learning pipeline: data processing, feature preparation, model development using stratified 5-fold cross-validation, and clinical validation. Multiple algorithms were evaluated, with gradient boosting demonstrating superior performance. Model interpretability was achieved through dual-approach analysis using SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations).ResultsThe gradient boosting model demonstrated strong performance (AUC 0.837, 95% CI: 0.802–0.872) with accuracy of 0.796 ± 0.050 and sensitivity of 0.388 ± 0.129. Length of hospital stay (38.0% contribution) and post-transplant systolic blood pressure (30.0% contribution) emerged as primary predictors, with differences between living and deceased donor subgroups. Pre-transplant BMI showed a higher importance in deceased donor recipients (12.6% vs. 2.6%), while HbA1c and eGFR were more impacting in living donor outcomes. The readmission rate in our cohort (88.9%, n = 523) was higher than previously reported ranges (18–47%), likely reflecting center-specific practices.ConclusionsOur XAI-based machine learning model combines strong predictive performance with clinical interpretability, offering transplant physicians donor-specific risk stratification capabilities. The web-based implementation facilitates practical integration into clinical workflows. Given our single-center experience and high proportion of living donors, external validation across diverse transplant centers is essential before widespread implementation. Our approach establishes a framework for developing center-specific risk prediction tools in transplant medicine.
- Research Article
- 10.52783/jisem.v10i51s.10442
- May 30, 2025
- Journal of Information Systems Engineering and Management
In order to enhance transparency and interpretability, the main goal of this project is to create a hybrid deep learning model for fake news detection by fusing Explainable AI (XAI) techniques like SHapley Additive exPlanations (SHAP) with XLNet, FastText, and CNN Algorithm. Introduction: Fake news rapid spread in the digital age has turned into a significant issue that influences social stability, public opinion, and political outcomes . False information has spread by virtue of social media platforms' inability to distinguish between authentic and fraudulent content . Despite their effectiveness, traditional fact-checking methods are time-consuming and unable to handle the volume of data generated daily . As a result, automated systems for detecting false news that utilize advanced artificial Intelligence demonstrated impressive performance in text classification tasks,such as identifying false news. It is challenging to comprehend how these models make decisions, though, because they function as black-box systems. In order to improve interpretability, explainable AI (XAI) techniques have been developed. The SHapley Additive exPlanations (SHAP) method is one that offers details on model predictions . Objectives: The objective of this project is to develop a sophisticated fake news detection system that combines advanced natural language processing and machine learning techniques. By integrating XLNet for superior language understanding, FastText for efficient word representation, and Convolutional Neural Networks (CNNs) for robust feature extraction, the system aims to enhance detection accuracy. Additionally, incorporating Explainable AI techniques, particularly SHAP, will provide clear and interpretable explanations of the model's predictions. This dual focus on performance and transparency seeks to create a reliable tool for identifying misinformation, ultimately fostering greater public trust in digital information sources. Methods: Convolutional Neural Networks (CNN), XL Net, and SHAP with Fast Text are examples of Explainable AI (XAI) techniques that were used in the study's hybrid deep learning methodology. Group 1: Robert and Bert Although methods are effective, they are not transparent enough for users to comprehend and have faith in their predictions. Group 2: Explainable AI and Fat text were used in combination with the Hybrid Model. Results: The hybrid model's accuracy of 92.3% represents a 5.6% improvement over the baseline accuracy of 87.4%. This shows that the hybrid approach is more effective at correctly distinguishing between real and fake news articles. Additionally, the hybrid model is more effective at reducing false positives, as evidenced by its 90.5% accuracy, which is 6.2% higher than the baseline model's 85.2% accuracy. Similarly, from 86.1% in the baseline model to 91.8% in the hybrid model, the hybrid model's recall increases by 6.6%, indicating that it is better at spotting fake news. Finally, the F1-score, which strikes a balance between recall and precision, increased from 85.6% to 91.1%, a 6.4% improvement. Conclusions: By combining XL Net, Fast Text, CNN, and Explainable AI techniques, the proposed hybrid deep learning model significantly increases the accuracy of fake news detection while maintaining interpretability. This tactic provides a robust and transparent framework for effectively combating misinformation.
- Research Article
67
- 10.2147/cmar.s181396
- Nov 1, 2018
- Cancer Management and Research
BackgroundHepatocellular carcinoma (HCC) is a major cause of cancer mortality and an increasing incidence worldwide; however, there are very few effective diagnostic approaches and prognostic biomarkers.Materials and methodsOne hundred forty-nine pairs of HCC samples from Gene Expression Omnibus (GEO) were obtained to screen differentially expressed genes (DEGs) between HCC and normal samples. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, Gene ontology enrichment analyses, and protein–protein interaction network were used. Cox proportional hazards regression analysis was used to identify significant prognostic DEGs, with which a gene expression signature prognostic prediction model was identified in The Cancer Genome Atlas (TCGA) project discovery cohort. The robustness of this panel was assessed in the GSE14520 cohort. We verified details of the gene expression level of the key molecules through TCGA, GEO, and qPCR and used immunohistochemistry for substantiation in HCC tissues. The methylation states of these genes were also explored.ResultsNinety-eight genes, consisting of 13 upregulated and 85 downregulated genes, were screened out in three datasets. KEGG and Gene ontology analysis for the DEGs revealed important biological features of each subtype. Protein–protein interaction network analysis was constructed, consisting of 64 nodes and 115 edges. A subset of four genes (SPINK1, TXNRD1, LCAT, and PZP) that formed a prognostic gene expression signature was established from TCGA and validated in GSE14520. Next, the expression details of the four genes were validated with TCGA, GEO, and clinical samples. The expression panels of the four genes were closely related to methylation states.ConclusionThis study identified a novel four-gene signature biomarker for predicting the prognosis of HCC. The biomarkers may also reveal molecular mechanisms underlying development of the disease and provide new insights into interventional strategies.
- Research Article
- 10.12122/j.issn.1673-4254.2025.08.15
- Aug 20, 2025
- Nan fang yi ke da xue xue bao = Journal of Southern Medical University
To analyze the differences in the prognosis of gastric signet ring cell carcinoma (SRCC) among different races using the US Surveillance Epidemiology and End Results (SEER) database and The Cancer Genome Atlas (TCGA) database. We analyzed the data of patients with gastric SRCC from the SEER database from 2000 to 2020, and divided the patients into cohorts of whites, blacks, Asians or Pacific Islanders, American Indians/Alaska Natives according to their race. The prognosis and treatment of the cohorts were evaluated using baseline demographic analysis, Kamplan-Meier survival curve, and nomogram analysis. We analyzed the data of a total of 2058 patients, including 8.6% blacks, 72.4% whites, 16.6% Asians or Pacific Islanders, 1.0% American Indians/Alaska Natives, and 1.4% other races. The tumor grade varied among different races, and the prevalence and survival rates of patients differed significantly across races. The differences in the white cohort were the most prominent, and all the differences were statistically significant (P<0.05). Racial differences were also noted in patient management and prognosis. There are racial differences in tumor grades and prognosis of gastric SRCC, and these differences provide evidence for optimizing clinical diagnosis and treatment strategies for this malignancy.
- Research Article
7
- 10.3389/fsurg.2022.819018
- Mar 17, 2022
- Frontiers in Surgery
PurposeThis study is based on the Surveillance, Epidemiology, and End Results (SEER) program to explore the prognostic differences between signet-ring cell carcinoma (SRC) and intestinal-type gastric carcinoma (ITGC). This study is also based on gene sequencing data from The Cancer Genome Atlas (TCGA) to identify unique genetic contributions to the prognostic differences between the two subtypes of gastric cancer.Patients and MethodsThe clinical data were based on the SEER database from 2004 to 2015. Kaplan–Meier (KM) curves were used to compare 5-year overall survival (OS), and Cox regression was used for univariate and multivariate analyses. Gene expression profiles were obtained from TCGA database, and differentially expressed genes (DEGs) were screened. Functional enrichment analysis, protein interaction and survival analysis will be further carried out. Genes of interest were verified by the Human Protein Atlas, immunohistochemistry, and encyclopedia of Cancer Cell Lines (CCLE). The relationship between genes of interest and immune cell infiltration was also analyzed by Tumor Immune Estimation Resource (TIMER).ResultsCompared with ITGC patients, SRC patients were more likely to be female, tended to be younger, and have a greater tumor distribution in the middle and lower stomach (p < 0.01). SRCs showed a significantly better prognosis than ITGCs (p < 0.01) in early gastric cancer (EGC), while the prognosis of SRCs was significantly worse than ITGCs (p < 0.05) in advanced gastric cancer (AGC). A total of 256 DEGs were screened in SRCs compared to ITGCs, and the enrichment analysis and protein interactions revealed that differential genes were mainly related to extracellular matrix organization. Thrombospondin1 (THBS1) and serpin peptidase inhibitor, clade E, member 1 (SERPINE1) are significantly differentially expressed between SRC and ITGC, which has been preliminarily verified by immunohistochemistry and open-source databases. THBS1 and SERPINE1 are also associated with multiple immune cell infiltrates in gastric cancer.ConclusionsThere were significant differences in the clinicopathological features and prognosis between SRC and ITGC. These results suggest that SRC and ITGC may be two distinct types of tumors with different pathogeneses. We found many codifferentially expressed genes and important pathways between SRC and ITGC. THBS1 and SERPINE1 were significantly differentially expressed in the two types of gastric cancer, and may have potentially important functions.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.