Thyroid pathology image classification via multi-scale feature fusion and multi-instance learning
BackgroundThe global incidence of thyroid cancer has significantly increased, while traditional pathological diagnosis remains time-consuming and expert-dependent. This study develops an auxiliary diagnostic tool designed to reduce the workload of pathologists and improve diagnostic accuracy.MethodsOur study utilized 543 WSIs from Liuzhou Cancer Hospital for model development, employing a novel multi-feature fusion architecture that combines RetCCL, iBOT, and DINO embeddings. We systematically evaluated stain normalization and multi-scale analysis across four multiple-instance learning (MIL) frameworks: CLAM-SB (single-branch), CLAM-MB (multi-branch), DTFD (double-tier), and LA-MIL (location-aware). The method was rigorously validated on an independent set of 128 WSIs from Taizhou Cancer Hospital.ResultsThe results show that stain normalization, multi-scale fusion, and multi-feature fusion significantly improve classification performance. In 10-fold cross-validation on the internal dataset, the system demonstrated significant improvements over the baseline RetCCL model: AUC (0.9900 vs. 0.9629), accuracy (0.9594 vs. 0.8951), with relative improvements of 2.8% in AUC and 7.2% in accuracy. Precision increased by 11.5% (0.9434 vs. 0.8461) and F1-score by 9.8% (0.9511 vs. 0.8665). On the external validation dataset, the model maintained robust performance with an AUC of 0.9584, accuracy of 0.9070, precision of 0.9247, and F1-score of 0.9348, confirming its reliability and applicability.ConclusionsWe propose a weakly supervised MIL framework integrating multi-scale analysis and cross-model feature fusion for thyroid cancer diagnosis. Our method showed promising and consistent results across internal and external datasets. While further clinical validation and workflow integration are needed, the results suggest the potential of this approach to assist pathologists in diagnostic workflows, particularly in resource-constrained settings.
- Research Article
5
- 10.2196/46165
- Jul 20, 2023
- Journal of Medical Internet Research
Mood disorder has emerged as a serious concern for public health; in particular, bipolar disorder has a less favorable prognosis than depression. Although prompt recognition of depression conversion to bipolar disorder is needed, early prediction is challenging due to overlapping symptoms. Recently, there have been attempts to develop a prediction model by using federated learning. Federated learning in medical fields is a method for training multi-institutional machine learning models without patient-level data sharing. This study aims to develop and validate a federated, differentially private multi-institutional bipolar transition prediction model. This retrospective study enrolled patients diagnosed with the first depressive episode at 5 tertiary hospitals in South Korea. We developed models for predicting bipolar transition by using data from 17,631 patients in 4 institutions. Further, we used data from 4541 patients for external validation from 1 institution. We created standardized pipelines to extract large-scale clinical features from the 4 institutions without any code modification. Moreover, we performed feature selection in a federated environment for computational efficiency and applied differential privacy to gradient updates. Finally, we compared the federated and the 4 local models developed with each hospital's data on internal and external validation data sets. In the internal data set, 279 out of 17,631 patients showed bipolar disorder transition. In the external data set, 39 out of 4541 patients showed bipolar disorder transition. The average performance of the federated model in the internal test (area under the curve [AUC] 0.726) and external validation (AUC 0.719) data sets was higher than that of the other locally developed models (AUC 0.642-0.707 and AUC 0.642-0.699, respectively). In the federated model, classifications were driven by several predictors such as the Charlson index (low scores were associated with bipolar transition, which may be due to younger age), severe depression, anxiolytics, young age, and visiting months (the bipolar transition was associated with seasonality, especially during the spring and summer months). We developed and validated a differentially private federated model by using distributed multi-institutional psychiatric data with standardized pipelines in a real-world environment. The federated model performed better than models using local data only.
- Research Article
18
- 10.3389/fnins.2021.721268
- Aug 11, 2021
- Frontiers in Neuroscience
ObjectiveRadiomics and morphological features were associated with aneurysms rupture. However, the multicentral study of their predictive power for specific-located aneurysms rupture is rare. We aimed to determine robust radiomics features related to middle cerebral artery (MCA) aneurysms rupture and evaluate the additional value of combining morphological and radiomics features in the classification of ruptured MCA aneurysms.MethodsA total of 632 patients with 668 MCA aneurysms (423 ruptured aneurysms) from five hospitals were included. Radiomics and morphological features of aneurysms were extracted on computed tomography angiography images. The model was developed using a training dataset (407 patients) and validated with the internal (152 patients) and external validation (73 patients) datasets. The support vector machine method was applied for model construction. Optimal radiomics, morphological, and clinical features were used to develop the radiomics model (R-model), morphological model (M-model), radiomics-morphological model (RM-model), clinical-morphological model (CM-model), and clinical-radiomics-morphological model (CRM-model), respectively. A comprehensive nomogram integrating clinical, morphological, and radiomics predictors was generated.ResultsWe found seven radiomics features and four morphological predictors of MCA aneurysms rupture. The R-model obtained an area under the receiver operating curve (AUC) of 0.822 (95% CI, 0.776, 0.867), 0.817 (95% CI, 0.744, 0.890), and 0.691 (95% CI, 0.567, 0.816) in the training, temporal validation, and external validation datasets, respectively. The RM-model showed an AUC of 0.848 (95% CI, 0.810, 0.885), 0.865 (95% CI, 0.807, 0.924), and 0.721 (95% CI, 0.601, 0.841) in the three datasets. The CRM-model obtained an AUC of 0.856 (95% CI, 0.820, 0.892), 0.882 (95% CI, 0.828, 0.936), and 0.738 (95% CI, 0.618, 0.857) in the three datasets. The CRM-model and RM-model outperformed the CM-model and M-model in the internal datasets (p < 0.05), respectively. But these differences were not statistically significant in the external dataset. Decision curve analysis indicated that the CRM-model obtained the highest net benefit for most of the threshold probabilities.ConclusionRobust radiomics features were determined related to MCA aneurysm rupture. The RM-model exhibited good ability in classifying ruptured MCA aneurysms. Integrating radiomics features into conventional models might provide additional value in ruptured MCA aneurysms classification.
- Research Article
57
- 10.1001/jamanetworkopen.2020.27426
- Nov 30, 2020
- JAMA Network Open
Personalized radiotherapy planning depends on high-quality delineation of target tumors and surrounding organs at risk (OARs). This process puts additional time burdens on oncologists and introduces variability among both experts and institutions. To explore clinically acceptable autocontouring solutions that can be integrated into existing workflows and used in different domains of radiotherapy. This quality improvement study used a multicenter imaging data set comprising 519 pelvic and 242 head and neck computed tomography (CT) scans from 8 distinct clinical sites and patients diagnosed either with prostate or head and neck cancer. The scans were acquired as part of treatment dose planning from patients who received intensity-modulated radiation therapy between October 2013 and February 2020. Fifteen different OARs were manually annotated by expert readers and radiation oncologists. The models were trained on a subset of the data set to automatically delineate OARs and evaluated on both internal and external data sets. Data analysis was conducted October 2019 to September 2020. The autocontouring solution was evaluated on external data sets, and its accuracy was quantified with volumetric agreement and surface distance measures. Models were benchmarked against expert annotations in an interobserver variability (IOV) study. Clinical utility was evaluated by measuring time spent on manual corrections and annotations from scratch. A total of 519 participants' (519 [100%] men; 390 [75%] aged 62-75 years) pelvic CT images and 242 participants' (184 [76%] men; 194 [80%] aged 50-73 years) head and neck CT images were included. The models achieved levels of clinical accuracy within the bounds of expert IOV for 13 of 15 structures (eg, left femur, κ = 0.982; brainstem, κ = 0.806) and performed consistently well across both external and internal data sets (eg, mean [SD] Dice score for left femur, internal vs external data sets: 98.52% [0.50] vs 98.04% [1.02]; P = .04). The correction time of autogenerated contours on 10 head and neck and 10 prostate scans was measured as a mean of 4.98 (95% CI, 4.44-5.52) min/scan and 3.40 (95% CI, 1.60-5.20) min/scan, respectively, to ensure clinically accepted accuracy. Manual segmentation of the head and neck took a mean 86.75 (95% CI, 75.21-92.29) min/scan for an expert reader and 73.25 (95% CI, 68.68-77.82) min/scan for a radiation oncologist. The autogenerated contours represented a 93% reduction in time. In this study, the models achieved levels of clinical accuracy within expert IOV while reducing manual contouring time and performing consistently well across previously unseen heterogeneous data sets. With the availability of open-source libraries and reliable performance, this creates significant opportunities for the transformation of radiation treatment planning.
- Research Article
109
- 10.1016/s2589-7500(21)00278-8
- Feb 23, 2022
- The Lancet Digital Health
Deep learning-enabled pelvic ultrasound images for accurate diagnosis of ovarian cancer in China: a retrospective, multicentre, diagnostic study
- Research Article
3
- 10.1111/os.13894
- Nov 7, 2023
- Orthopaedic surgery
Modic changes (MCs) are the most prevalent classification system for describing intravertebral MRI signal intensity changes. However, interpreting these intricate MRI images is a complex and time-consuming process. This study investigates the performance of single shot multibox detector (SSD) and ResNet18 network-based automatic detection and classification of MCs. Additionally, it compares the inter-observer agreement and observer-classifier agreement in MCs diagnosis to validate the feasibility of deep learning network-assisted detection of classified MCs. A retrospective analysis of 140 patients with MCs who underwent MRI diagnosis and met the inclusion and exclusion criteria in Tianjin Hospital from June 2020 to June 2021 was used as the internal dataset. This group consisted of 55 males and 85 females, aged 25 to 89 years, with a mean age of (59.0 ± 13.7) years. An external test dataset of 28 patients, who met the same criteria and were assessed using different MRI equipment at Tianjin Hospital, was also gathered, including 11 males and 17 females, aged 31 to 84 years, with a mean age of 62.7 ± 10.9 years. After Physician 1 (with 15 years of experience) annotated all MRI images, the internal dataset was imported into the deep learning model for training. The model comprises an SSD network for lesion localization and a ResNet18 network for lesion classification. Performance metrics, including accuracy, recall, precision, F1 score, confusion matrix, and inter-observer agreement parameter Kappa value, were used to evaluate the model's performance on the internal and external datasets. Physician 2 (with 1 year of experience) re-labeled the internal and external test datasets to compare the inter-observer agreement and observer-classifier agreement. In the internal dataset, when models were utilized for the detection and classification of MCs, the accuracy, recall, precision and F1 score reached 86.25%, 87.77%, 84.92% and 85.60%, respectively. The Kappa value of the inter-observer agreement was 0.768 (95% CI: 0.656, 0.847),while observer-classifier agreement was 0.717 (95% CI: 0.589, 0.809).In the external test dataset, the model's the accuracy, recall, precision and F1 scores for diagnosing MCs reached 75%, 77.08%, 77.80% and 74.97%, respectively. The inter-observer agreement was 0.681 (95% CI: 0.512, 0.677), and observer-classifier agreement was 0.519 (95% CI: 0.290, 0.690). The model demonstrated strong performance in detecting and classifying MCs, achieving high agreement with physicians in MCs diagnosis. These results suggest that deep learning models have the potential to facilitate the application of intelligent assisted diagnosis techniques in the field of spine research.
- Research Article
2
- 10.1007/s10278-024-01225-4
- Aug 13, 2024
- Journal of imaging informatics in medicine
The diagnosis and treatment of pulmonary hypertension have changed dramatically through the re-defined diagnostic criteria and advanced drug development in the past decade. The application of Artificial Intelligence for the detection of elevated pulmonary arterial pressure (ePAP) was reported recently. Artificial Intelligence (AI) has demonstrated the capability to identify ePAP and its association with hospitalization due to heart failure when analyzing chest X-rays (CXR). An AI model based on electrocardiograms (ECG) has shown promise in not only detecting ePAP but also in predicting future risks related to cardiovascular mortality. We aimed to develop an AI model integrating ECG and CXR to detect ePAP and evaluate their performance. We developed a deep-learning model (DLM) using paired ECG and CXR to detect ePAP (systolic pulmonary artery pressure > 50mmHg in transthoracic echocardiography). This model was further validated in a community hospital. Additionally, our DLM was evaluated for its ability to predict future occurrences of left ventricular dysfunction (LVD, ejection fraction < 35%) and cardiovascular mortality. The AUCs for detecting ePAP were as follows: 0.8261 with ECG (sensitivity 76.6%, specificity 74.5%), 0.8525 with CXR (sensitivity 82.8%, specificity 72.7%), and 0.8644 with a combination of both (sensitivity 78.6%, specificity 79.2%) in the internal dataset. In the external validation dataset, the AUCs for ePAP detection were 0.8348 with ECG, 0.8605 with CXR, and 0.8734 with the combination. Furthermore, using the combination of ECGs and CXR, the negative predictive value (NPV) was 98% in the internal dataset and 98.1% in the external dataset. Patients with ePAP detected by the DLM using combination had a higher risk of new-onset LVD with a hazard ratio (HR) of 4.51 (95% CI: 3.54-5.76) in the internal dataset and cardiovascular mortality with a HR of 6.08 (95% CI: 4.66-7.95). Similar results were seen in the external validation dataset. The DLM, integrating ECG and CXR, effectively detected ePAP with a strong NPV and forecasted future risks of developing LVD and cardiovascular mortality. This model has the potential to expedite the early identification of pulmonary hypertension in patients, prompting further evaluation through echocardiography and, when necessary, right heart catheterization (RHC), potentially resulting in enhanced cardiovascular outcomes.
- Research Article
1
- 10.1155/2022/4567063
- May 21, 2022
- Journal of oncology
The aim of this study was to identify hub genes associated with metastasis and prognosis in melanoma. Weighted gene coexpression network analysis (WGCNA) was performed to screen and identify hub genes. ROC and K-M analyses were used to verify the hub genes in the internal and external data sets. The risk score model and nomogram model were constructed based on the IHC result. Through WGCNA, the three hub genes, SNRPD2, SNRPD3, and EIF4A3, were identified. In the external data set, the hub genes identified were associated with the worse prognosis (TCGA, SNRPD2, P ≤ 0.02; SNRPD3, P = 0.12; EIF4A3, P = 0.11; GSE65904, SNRPD2, P = 0.04; SNRPD3, P = 0.10; EIF4A3, P < 0.01; GSE19234, SNRPD2, P < 0.01; SNRPD3, P < 0.01; EIF4A3, P < 0.01). In the GSE8401, we found that the hub genes were highly expressed in the metastasis compared with the nonmetastasis group (SNRPD2, 988.5 ± 47.83 vs. 738.4 ± 35.35, P < 0.01; SNRPD3, 502.7 ± 25.7 vs. 416.4 ± 23.88, P = 0.02; EIF4A3, 567.6 ± 19.56 vs. 495.2 ± 21.1, P = 0.01). Moreover, the hub genes were identified by the IHC in our data set. The result was similar with the external data set. The hub genes could predict the metastasis and prognosis in the Chinese MM patients. Finally, the GSEA and Pearson analysis demonstrated that the SNRPD2 was associated with the immunotherapy. The three hub genes were identified and validated in MM patients in external and internal data sets. The risk factor model was constructed and verified as a powerful model to predict metastasis and prognosis in MM patients.
- Research Article
1
- 10.1016/j.ijmedinf.2025.105807
- Apr 1, 2025
- International journal of medical informatics
An interpretable hybrid machine learning approach for predicting three-month unfavorable outcomes in patients with acute ischemic stroke.
- Abstract
- 10.1182/blood-2022-168812
- Nov 15, 2022
- Blood
Subjects with Chronic Myeloid Leukemia Identified As Intermediate- or High-Risk Subjects Using the Imatinib Therapy Failure (IMTF) Model Benefit from Initial Therapy with a Second-Generation Tyrosine Kinase Inhibitor
- Research Article
9
- 10.1007/s40618-023-02042-2
- Apr 5, 2023
- Journal of Endocrinological Investigation
Silent corticotroph adenomas (SCAs) are a subtype of nonfunctioning pituitary adenomas that exhibit more aggressive behavior. However, rapid and accurate preoperative diagnostic methods are currently lacking. The purpose of this study was to examine the differences between SCA and non-SCA features and to establish radiomics models and a clinical scale for rapid and accurate prediction. A total of 260 patients (72 SCAs vs. 188 NSCAs) with nonfunctioning adenomas from Peking Union Medical College Hospital were enrolled in the study as the internal dataset. Thirty-five patients (6 SCAs vs. 29 NSCAs) from Fuzhou General Hospital were enrolled as the external dataset. Radiomics models and an SCA scale to preoperatively diagnose SCAs were established based on MR images and clinical features. There were more female patients (internal dataset: p < 0.001; external dataset: p = 0.028) and more multiple microcystic changes (internal dataset: p < 0.001; external dataset: p = 0.012) in the SCA group. MRI showed more invasiveness (higher Knosp grades, p ≤ 0.001). The radiomics model achieved AUCs of 0.931 and 0.937 in the internal and external datasets, respectively. The clinical scale achieved an AUC of 0.877 and a sensitivity of 0.952 in the internal dataset and an AUC of 0.899 and a sensitivity of 1.0 in the external dataset. Based on clinical information and imaging characteristics, the constructed radiomics model achieved high preoperative diagnostic ability. The SCA scale achieved the purpose of rapidity and practicality while ensuring sensitivity, which is conducive to simplifying clinical work.
- Research Article
3
- 10.2196/45202
- Mar 8, 2024
- JMIR Formative Research
BackgroundVancomycin pharmacokinetics are highly variable in patients with critical illnesses, and clinicians commonly use population pharmacokinetic (PPK) models based on a Bayesian approach to dose. However, these models are population-dependent, may only sometimes meet the needs of individual patients, and are only used by experienced clinicians as a reference for making treatment decisions. To assist real-world clinicians, we developed a deep learning–based decision-making system that predicts vancomycin therapeutic drug monitoring (TDM) levels in patients in intensive care unit.ObjectiveThis study aimed to establish joint multilayer perceptron (JointMLP), a new deep-learning model for predicting vancomycin TDM levels, and compare its performance with the PPK models, extreme gradient boosting (XGBoost), and TabNet.MethodsWe used a 977-case data set split into training and testing groups in a 9:1 ratio. We performed external validation of the model using 1429 cases from Kangwon National University Hospital and 2394 cases from the Medical Information Mart for Intensive Care–IV (MIMIC-IV). In addition, we performed 10-fold cross-validation on the internal training data set and calculated the 95% CIs using the metric. Finally, we evaluated the generalization ability of the JointMLP model using the MIMIC-IV data set.ResultsOur JointMLP model outperformed other models in predicting vancomycin TDM levels in internal and external data sets. Compared to PPK, the JointMLP model improved predictive power by up to 31% (mean absolute error [MAE] 6.68 vs 5.11) on the internal data set and 81% (MAE 11.87 vs 6.56) on the external data set. In addition, the JointMLP model significantly outperforms XGBoost and TabNet, with a 13% (MAE 5.75 vs 5.11) and 14% (MAE 5.85 vs 5.11) improvement in predictive accuracy on the inner data set, respectively. On both the internal and external data sets, our JointMLP model performed well compared to XGBoost and TabNet, achieving prediction accuracy improvements of 34% and 14%, respectively. Additionally, our JointMLP model showed higher robustness to outlier data than the other models, as evidenced by its higher root mean squared error performance across all data sets. The mean errors and variances of the JointMLP model were close to zero and smaller than those of the PPK model in internal and external data sets.ConclusionsOur JointMLP approach can help optimize treatment outcomes in patients with critical illnesses in an intensive care unit setting, reducing side effects associated with suboptimal vancomycin administration. These include increased risk of bacterial resistance, extended hospital stays, and increased health care costs. In addition, the superior performance of our model compared to existing models highlights its potential to help real-world clinicians.
- Research Article
39
- 10.3174/ajnr.a7034
- Mar 4, 2021
- American Journal of Neuroradiology
Small intracranial aneurysms are being increasingly detected while the rupture risk is not well-understood. We aimed to develop rupture-risk models of small aneurysms by combining clinical, morphologic, and hemodynamic information based on machine learning techniques and to test the models in external validation datasets. From January 2010 to December 2016, five hundred four consecutive patients with only small aneurysms (<5 mm) detected by CTA and invasive cerebral angiography (or surgery) were retrospectively enrolled and randomly split into training (81%) and internal validation (19%) sets to derive and validate the proposed machine learning models (support vector machine, random forest, logistic regression, and multilayer perceptron). Hemodynamic parameters were obtained using computational fluid dynamics simulation. External validation was performed in other hospitals to test the models. The support vector machine performed the best with areas under the curve of 0.88 (95% CI, 0.85-0.92) and 0.91 (95% CI, 0.74-0.98) in the training and internal validation datasets, respectively. Feature ranks suggested hemodynamic parameters, including stable flow pattern, concentrated inflow streams, and a small (<50%) flow-impingement zone, and the oscillatory shear index coefficient of variation, were the best predictors of aneurysm rupture. The support vector machine showed an area under the curve of 0.82 (95% CI, 0.69-0.94) in the external validation dataset, and no significant difference was found for the areas under the curve between internal and external validation datasets (P = .21). This study revealed that machine learning had a good performance in predicting the rupture status of small aneurysms in both internal and external datasets. Aneurysm hemodynamic parameters were regarded as the most important predictors.
- Research Article
- 10.1016/j.xops.2025.100883
- Jul 1, 2025
- Ophthalmology Science
A Generalized and Interpretable Multi-Label Multi-Disease Screening System for Ocular Anterior Segment Disease Detection
- Research Article
3
- 10.1016/j.compmedimag.2022.102152
- Jan 1, 2023
- Computerized Medical Imaging and Graphics
Automatic development of 3D anatomical models of border zone and core scar regions in the left ventricle.
- Research Article
73
- 10.1016/j.cmpb.2020.105819
- Nov 2, 2020
- Computer Methods and Programs in Biomedicine
Automatic stenosis recognition from coronary angiography using convolutional neural networks
- New
- Research Article
- 10.1186/s13000-025-01714-2
- Nov 6, 2025
- Diagnostic pathology
- New
- Research Article
- 10.1186/s13000-025-01716-0
- Nov 6, 2025
- Diagnostic pathology
- New
- Research Article
- 10.1186/s13000-025-01723-1
- Nov 4, 2025
- Diagnostic Pathology
- Research Article
- 10.1186/s13000-025-01722-2
- Oct 23, 2025
- Diagnostic Pathology
- Research Article
- 10.1186/s13000-025-01688-1
- Oct 23, 2025
- Diagnostic Pathology
- Research Article
- 10.1186/s13000-025-01715-1
- Oct 17, 2025
- Diagnostic Pathology
- Research Article
- 10.1186/s13000-025-01718-y
- Oct 17, 2025
- Diagnostic Pathology
- Research Article
- 10.1186/s13000-025-01720-4
- Oct 16, 2025
- Diagnostic Pathology
- Research Article
- 10.1186/s13000-025-01724-0
- Oct 16, 2025
- Diagnostic Pathology
- Research Article
- 10.1186/s13000-025-01726-y
- Oct 16, 2025
- Diagnostic Pathology
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.