Articles published on Unsupervised Machine Learning Algorithm
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
813 Search results
Sort by Recency
- Research Article
- 10.1080/10255842.2025.2606227
- Dec 17, 2025
- Computer Methods in Biomechanics and Biomedical Engineering
- Zitong Cao + 5 more
Long non-coding RNA (lncRNA) screening holds promise for elucidating mechanisms behind graphene-related tumor therapy. This study aimed to investigate the role of graphene therapy-related lncRNA signatures (GTLncRNASig) in lung adenocarcinoma (LUAD) and potential pathways within the tumor microenvironment. LUAD transcriptome and clinical data from The Cancer Genome Atlas (TCGA) were analyzed to develop a prognostic risk model for GTLncRNASig using Cox regression. Further analyses included Kaplan-Meier survival analysis, principal component analysis (PCA), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment, a nomogram risk model, and tumor immune dysfunction and exclusion (TIDE) assessment. Drug sensitivity was explored using this model. Mendelian randomization (MR), Double Machine Learning (DML) and Bayesian weighting validated causal relationships between enriched pathways and LUAD. Supervised and unsupervised machine learning algorithms evaluated robustness and uncovered hidden correlations in MR results. A 35-lncRNA risk model (GTLncRNASig) was established, identifying strong associations with immune pathways, including Type II IFN Response and MHC class I. High-risk subgroups exhibited immune microenvironment-linked prognostic traits. Screening revealed 12 potential chemotherapy agents, and the stem cell index mRNAsi correlated with LUAD prognosis. MR and Bayesian weighting implicated the systemic lupus erythematosus (SLE) pathway as a LUAD risk factor. Machine learning confirmed the reliability of these findings. This study identified 35 lncRNAs that constitute a prognostic signature in the context of graphene-related LUAD treatment, highlighting immune-related processes and the SLE pathway’s role in LUAD. These insights link autoimmune diseases with tumorigenesis and provide valuable guidance for immunotherapy predictions.
- Research Article
- 10.1097/aog.0000000000006040
- Dec 1, 2025
- Obstetrics and gynecology
- Rebecca Horgan + 4 more
To apply unsupervised machine learning techniques to first-trimester fetal cardiac data to enhance early risk stratification of small-for-gestational-age (SGA) birth weight. This was a prospective cohort study that enrolled patients up to 13 6/7 weeks of gestation without fetal, umbilical cord, or placental abnormalities. At the first-trimester ultrasonogram, the chest area, heart area, ventricular inlet lengths, and spectral and color Doppler of the atrioventricular valves were assessed. An unsupervised machine learning technique, k-means clustering, was applied to sort fetuses into risk groups for SGA birth weight , defined as a birth weight less than the 10th percentile for gestational age. Candidate variables were selected with regression analyses, and the elbow method was used to determine the optimal number of clusters. Cumulative rates of outcomes were plotted with Kaplan-Meier analysis, and model performance was tested with area under the curve values with repeated cross-validation. Six hundred seventeen pregnancies were included in the analysis, with 45 (7.3%) patients delivering a neonate with SGA birth weight. z-scores of the chest area ( P =.031) and tricuspid valve E/A ratio ( P <.001) showed an independent association with SGA birth weight and were used in the clustering algorithm. An unsupervised machine learning algorithm blinded to the outcome identified three risk clusters: low (n=202), intermediate (n=217), and high (n=198). The rates of SGA birth weight (1.2%, 5.4%, and 14.4%, respectively, P <.001) and nonreassuring fetal heart rate tracings (3.6%, 5.4%, and 8.6%, respectively, P =.039) differed significantly among the three risk clusters. Area under the curve values of the model in cross-validation samples were 0.71 (95% CI, 0.64-0.77). Using the low-risk cluster as a threshold, the model specificity was 95.5% and sensitivity was 35.0% for ruling out SGA birth weight. The negative predictive value for ruling out SGA birth weight was 99.0%. Unsupervised machine learning of first-trimester fetal cardiac parameters can effectively stratify risk for SGA birth weight.
- Research Article
- 10.1785/0220250202
- Nov 26, 2025
- Seismological Research Letters
- Chengping Chai + 4 more
Abstract Seismic sensors deployed near roadways effectively capture ground vibrations generated by passing vehicles. Although both traditional and machine-learning algorithms have been utilized for analyzing such signals, independent validation of detected vehicle events remains limited. We applied two unsupervised machine-learning algorithms, uniform manifold approximation and projection for dimension reduction, and hierarchical density-based spatial clustering of applications with noise, to continuous seismic data collected along a road on the main campus of Oak Ridge National Laboratory. The algorithms identified seven distinct cluster labels across the entire dataset. By comparing these cluster labels with precipitation records from a nearby weather station and image-derived labels from a local camera system, we identified one cluster associated with rainfall and another with vehicle activity. Our algorithms identified a greater number of vehicle-related labels compared to the camera-derived labels because seismic data are unaffected by poor lighting conditions. The arrival times of the newly detected vehicle signals corresponded well with the road’s speed limit, supporting our findings. Our algorithm outperformed the short-term average/long-term average method and k-means clustering. Our results suggest that seismic data, when analyzed with machine-learning algorithms, can complement existing vehicle monitoring systems, particularly under challenging environmental conditions.
- Research Article
- 10.1287/mnsc.2023.03771
- Nov 17, 2025
- Management Science
- Kai Wendt + 3 more
We study a supply chain distribution system and investigate experimentally operations of markets where retailers can trade digital claims (tokens) on the supplier’s capacity. Subjects play the role of retailers, have heterogeneous valuations of goods, face random demands, and buy tokens on the supplier’s capacity. Following demand realization, retailers trade tokens with each other in markets implemented as double-sided, single-price, blind, batch auctions. We compare six behavioral treatments, featuring two wholesale prices and three market sizes. As expected, markets reduce leftovers and shortages. Interestingly, market-clearing prices are anchored to wholesale prices and do not signal the value of goods in large markets. Players deploy novel ordering and trading strategies that differ from the transshipment literature. We identify strategies by applying unsupervised machine learning algorithms. In one strategy, players buy a few claims and, after demand realization, use the market to satisfy it. Other players buy more claims than the maximum demand and, once demand is known, sell their excess on the market. Both strategies reduce costs from demand uncertainty but expose players to liquidity and mistakes risks. A third strategy, in which players order from the supplier initially as if expecting the market to be cleared cooperatively, is more profitable. This strategy diversifies demand and market risks. The introduction of markets causes the “pull-to-the-mean” effect and increases order variability. Thus, markets can cause the Bullwhip Effect. Retailers’ and the supply chain’s average profits are higher with markets, but suppliers with low wholesale prices suffer from lower revenues because of the pull-to-the-mean effect. This paper was accepted by Elena Katok, operations management. Supplemental Material: The data files are available at https://doi.org/10.1287/mnsc.2023.03771 .
- Research Article
- 10.1093/ajcp/aqaf121.060
- Nov 1, 2025
- American Journal of Clinical Pathology
- Samir Atiya + 2 more
Abstract Introduction/Objective Quality control (QC) monitoring is a cornerstone of quality assurance in clinical laboratories. A mainstay of QC monitoring is the use of Levey-Jennings charts—introduced in 1950 as an adaptation of Shewhart’s statistical control charts used in industrial manufacturing. In these charts, consecutive assay results of QC materials are plotted over time, allowing for the detection of shifts, drifts, or outliers in repeated measures using well-established Westgard Rules or Six Sigma principles. A practical challenge in traditional QC monitoring is determining appropriate values for setting the mean and standard deviation control parameters, particularly in laboratories with extensive test menus and multiple analyzers performing the same test. Methods/Case Report We employ a variety of analytical and machine learning approaches—such as Gaussian curve fitting, moving averages, and unsupervised machine learning algorithms—to examine whether QC parameters can be generated analytically and automatically from live QC data. Results We remove outliers, adjust for instrument performance-related drifts, and account for reagent or QC material lot changes that would otherwise confound control limits calculated from noisy, uncurated data. Conclusion This work confirms that QC limit establishment and evaluation can be performed using more automated methods to help improve laboratory operations.
- Research Article
- 10.1155/da/3990416
- Oct 24, 2025
- Depression and Anxiety
- Can Hou + 6 more
BackgroundMany patients experience psychological distress in the preoperative phase, whilst screening based on cut-off points of assessment scales showed limited value in predicting clinical postoperative adverse outcomes.MethodsTo identify preoperative psychological distress and investigate their associations with adverse surgery-related outcomes, we included 16,662 patients from the China Surgery and Anesthesia Cohort (CSAC). We applied dimensionality reduction and unsupervised machine learning algorithms to classify participants into distinct psychological patterns. We then assessed the associations of machine learning-identified psychological patterns and traditional cut-off based psychological symptoms, with various adverse surgery-related outcomes, using logistic and linear regression models while adjusting for other relevant covariates.ResultsWe successfully established clustering algorithms for 16,298 participants, demonstrating strong consistency in pattern features. Six distinct psychological patterns among participants were identified, including one group with normal psychological functioning and five groups with varying levels of psychological distress. All identified psychological distress patterns were significantly associated with most surgery-related adverse outcomes, both in short-term (e.g., any within-hospital postoperative complication, odds ratios [ORs] = 1.24–1.30) and long-term (e.g., cognitive impairment at 12 months postsurgery, 1.29–2.35). In contrast, traditional cut-off-based methods identified only 266 patients with significant psychological symptoms, which showed no association with some key short-term outcomes (e.g., length of hospital stay and postoperative complication), though they remained linked to most long-term outcomes.ConclusionsOur findings demonstrate the effectiveness of machine learning in accurately identifying patients with preoperative psychological distress who may require clinical attention, highlighting the potential of these techniques to guide targeted preoperative interventions and ultimately improve surgical outcomes.
- Research Article
- 10.1093/ndt/gfaf116.0727
- Oct 21, 2025
- Nephrology Dialysis Transplantation
- Simon Aberger + 11 more
Abstract Background and Aims Annual mortality still exceeds 15% in patients receiving maintenance hemodialysis, mainly driven by cardiovascular disease (CVD) burden. Therapies targeting calcium, phosphate and bone metabolism are still lacking evidence for consistent benefits, warranting new approaches to guide treatment decisions. Method A prospective, observational study was conducted across three Austrian hemodialysis centers including stable maintenance hemodialysis patients. Blood samples were collected at study entry. Biochemical markers of bone metabolism (iPTH, noxPTH®, betacrosslaps, osteoprotegerin, sRANKL), antioxidative capacity (ImAnOx®) and oxidative stress (PerOx®) were measured with standardized assays provided by Immundiagnostik AG. Unsupervised K-means machine learning algorithm was employed to cluster the dataset based on these biochemical variables. The 1-year mortality rates, treatment-related data and CVD burden were then compared between clusters. Results A total of n = 363 patients were included. Basic demographic data include a median age of 72 years, median hemodialysis vintage of 27.5 months and a male preponderance of 63%. Four distinct clusters were identified (Fig. 1A and B). Cluster 0 was defined by low noxPTH® (median 139 pg/mL), low sRANKL levels and high oxidative stress with the highest 1-year mortality rate (24.9%). In contrast, higher antioxidative capacity and beta-crosslap levels distinguished Clusters 1 and 2 with lower mortality rates (13.2% and 0% respectively). Cluster 3 was defined by high noxPTH levels (median 441 pg/mL) and showed an intermediate 1-year mortality rate (17.8%), yet the highest CVD burden (81%). Calcimimetic treatment was highest in Cluster 3 (57.8%), however, over 23% of patients in the low-noxPTH® Cluster 0 were inadequately treated with calcimimetics. Conclusion Unsupervised cluster analysis allowed risk-stratification of hemodialysis patients by biochemical profiles, reflecting low-turnover bone disease and oxidative stress as features with unfavourable outcome. This strategy could improve treatment categorization in future trials.
- Research Article
- 10.1038/s41598-025-20382-2
- Oct 21, 2025
- Scientific Reports
- Efe Onojete + 4 more
Humans face various diseases that are mainly caused by environmental conditions and living habits. These diseases exhibit several symptoms and can share a relationship based on their symptoms. The identification and interpretation of these groups of symptom-based diseases can aid in developing treatment plans for a new outbreak of disease. This research explores the intersection of machine learning and healthcare, specifically focusing on the enhancement of disease classification through symptom-based cluster analysis. By leveraging unsupervised machine learning algorithms, patterns and relationships within diverse symptom datasets were identified, revealing novel associations and subtypes in disease manifestation. The integration of a Large Language Model (LLM), specifically OpenAI’s Generative Pretrained Transformer(GPT), played a pivotal role in interpreting and communicating the complex outputs of the machine learning process. The results indicated a significant improvement in defining distinct clusters based on the relationship between diseases and symptoms, with GPT-4o providing simplified explanations that bridge the gap between machine-generated insights and healthcare professional’s understanding. The study’s findings offer a more profound understanding of the distinctive features characterising the different clusters of diseases generated by the machine learning models.
- Research Article
- 10.59934/jaiea.v5i1.1362
- Oct 15, 2025
- Journal of Artificial Intelligence and Engineering Applications (JAIEA)
- Tengku Didi Ferdillah Tengku + 2 more
Monitoring query performance in database systems is often a manual and reactive process, proving inefficient for the early detection of issues that can impact application stability. This research aims to design and implement a system for automated and proactive query performance anomaly detection. This system utilizes data from MySQL's Performance Schema and applies an unsupervised machine learning algorithm, namely Isolation Forest, to identify queries with unusual behavior based on eight researcher-selected performance metrics. The detection process is implemented to run periodically in the background and send early notifications via email. Experiments were conducted by varying the contamination parameter, with the model's performance evaluated using Precision, Recall, and F1-Score metrics. The experimental results indicate that the configuration with contamination=0.1 yielded the most optimal performance, achieving an F1-Score of 0.39 and a Recall of 100% for the anomaly class. The developed system successfully demonstrated its ability to detect various types of anomalies, including the N+1 query problem, and offers an efficient solution to proactively improve database system performance.
- Research Article
- 10.1093/ehjdh/ztaf115
- Oct 9, 2025
- European Heart Journal - Digital Health
- Marie-Ange Fleury + 18 more
Abstract Aims There is a lack of studies investigating the pathophysiologic and phenotypic distinctiveness of aortic stenosis (AS). This heterogeneity has important implications for identifying optimal intervention timing and potential medical management. This study seeks to identify phenogroups of AS using unsupervised machine learning to improve risk stratification. Methods and results A total of 349 patients with asymptomatic AS from the PROGRESSA study were included in this analysis. Echocardiographic, clinical and blood sample data were used in the unsupervised clustering process. Longitudinal echocardiographic data were used to evaluate AS progression. Five clusters of patients were revealed using 18 variables selected by an unsupervised machine learning algorithm. Amongst them, aortic valvular phenotype, mean gradient, peak jet velocity (Vpeak), and left ventricle stroke volume were selected as discriminatory variables. Following the clustering process, characteristics differed between clusters, including age, body mass index, and sex ratio (all P &lt; 0.001). Of note, cluster 1 showed higher AS severity at baseline with significantly higher initial Vpeak (344 [314; 376] cm/s) and calcium score (1257 [806; 1837] UA) (P &lt; 0.001). Patients from cluster 1 had a faster AS progression (progression of Vpeak = 22 [9; 39] cm/s/year), and calcium score (213 [111; 307] UA/year) (P &lt; 0.001). Cluster 1 was also associated with a higher composite risk of mortality and aortic valve replacement when adjusted for age, sex, and baseline AS severity (P &lt; 0.001). Conclusion Artificial intelligence-guided phenotypic classification revealed 5 distinct groups and enhanced risk stratification of patients with AS. This approach may be useful to optimize and individualize medical and interventional management of AS.
- Research Article
- 10.1021/acs.analchem.5c03117
- Oct 6, 2025
- Analytical chemistry
- Guanyang Xu + 4 more
Differential analysis in proteomics is pivotal for biomarker discovery and disease mechanism elucidation, yet traditional statistical methods are constrained by distributional assumptions and empirical fold change threshold dependencies. This study systematically evaluates 18 unsupervised anomaly detection machine learning (ML) algorithms against the established statistical frameworks for differential protein detection from proteomic data sets. Using in silico simulated data sets derived from experimental data, we enabled cross-algorithm comparability through a probability based transformation. Results demonstrated that ML methods, particularly the Minimum Covariance Determinant (MCD), outperformed statistical test in recall, precision, and accuracy, with superior robustness to intersample heterogeneity. Validation on real-world proteomic data further confirmed that the MCD-identified differentially expressed proteins comprehensively covered canonical pathways while uncovering novel tumor-associated functional biomolecules. This work establishes unsupervised ML methods as robust alternatives to traditional hypothesis-driven statistical approaches in proteomics differential analysis, offering enhanced reliability for precision medicine research.
- Research Article
- 10.3847/1538-4365/adfdd8
- Oct 1, 2025
- The Astrophysical Journal Supplement Series
- R Chen + 5 more
Abstract Solar active regions (ARs) host the majority of solar eruptions. Studying the evolution and morphological features of ARs is significant for understanding the physical mechanisms of solar eruptions and beneficial for forecasting hazardous space weather. This work presents an automated DBSCAN-based solar active region detection (DSARD) method for ARs observed in magnetograms. DSARD is based on an unsupervised machine learning algorithm called density-based spatial clustering of applications with noise (DBSCAN). This method is employed to identify ARs in magnetograms observed by the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory from 2010 to 2023. To avoid duplicate detections and minimize projection effects, we focus on a longitudinal range of ±6° from the central meridian of the solar disk. Within this range, we obtain the distributions of the number, area, magnetic flux, tilt angle, and butterfly diagram of bipolar ARs in latitudes and time intervals during solar cycle 24, as well as their drift velocities. Most of these statistical results align with previous studies, which validates our method. The asymmetry indices of the number of ARs, cumulative area, and total unsigned magnetic flux indicate that the northern hemisphere dominated in terms of AR activity during most of solar cycle 24, except near solar maximum. Additionally, we analyze the dipole tilt angles of ARs in solar cycle 24 and the rising phase of solar cycle 25, revealing that 13% and 16% of ARs, respectively, violate Hale’s law.
- Research Article
- 10.1016/j.jappgeo.2025.105846
- Oct 1, 2025
- Journal of Applied Geophysics
- Wakeel Hussain + 6 more
Hybrid modeling of deep neural networks and unsupervised machine learning algorithms for missing well log prediction based on geological lithofacies similarities
- Research Article
- 10.1111/ijn.70049
- Oct 1, 2025
- International journal of nursing practice
- Veysel Karani Baris + 2 more
To identify patterns and predictors of nurse turnover intentions based on years of nursing experience using a cluster analysis approach. Nurses with varying years of experience have different characteristics. These differences can also lead to distinct patterns and predictors of turnover intentions. For this descriptive study, 785 nurses from hospitals across different regions of Türkiye participated in a survey. Data was collected through online questionnaires between April and May 2022. The K-means unsupervised machine learning algorithm was employed to classify nurses into distinct clusters based on their experience. Multiple linear regression analyses were conducted to identify the predictors of turnover intention specific to each cluster. The STROBE guideline was followed for reporting. Cluster analysis grouped nurses into three categories by experience level: low, medium and high. The medium-experience group had the highest turnover intention, whereas the high-experience group had the lowest. Work stress was the only common predictor across all groups. Low income predicted turnover only for the low-experience group, and gender was significant only for the medium-experience group. This study revealed that turnover intention and its predictors vary by experience level, indicating a need for retention strategies tailored to nurses' years of experience. By considering subgroup characteristics, policymakers can develop targeted interventions to enhance nurse retention.
- Research Article
- 10.23838/pfm.2025.00177
- Sep 30, 2025
- Precision and Future Medicine
- Ziad Mumtaz Ramadan + 4 more
Endometriosis is a gynecologic inflammatory condition that affects up to 10% of reproductive-aged women worldwide. The disease exhibits heterogeneous presentations and is associated with a prolonged diagnostic delay, often exceeding seven years, because existing diagnostic modalities such as transvaginal ultrasound, magnetic resonance imaging, and the biomarker cancer antigen 125 (CA-125) are suboptimal. This review examines how machine learning (ML) is playing an increasingly significant role in early, non-surgical endometriosis diagnosis through two main approaches: symptom clustering and imaging integration. Unsupervised ML algorithms such as k-means, partitioning around medoids, and Bayesian networks have demonstrated success in identifying clinically informative endometriosis phenotypes from patient-reported symptoms and electronic health records. Concurrently, ML models such as convolutional neural networks and radiomics approaches have achieved high accuracy in lesion detection from imaging data, in some cases surpassing human interpretation. Despite these advances, significant challenges remain, including limited access to large, annotated multimodal datasets, the absence of widely accepted evaluation standards, and concerns regarding interpretability and generalizability. Multicenter, integrative studies and the incorporation of explainability techniques are recommended as potential strategies to address these gaps. Finally, multimodal ML approaches that combine symptomatology and imaging data hold substantial promise for reducing diagnostic delays, facilitating early intervention, and improving clinical outcomes in the management of endometriosis.
- Research Article
- 10.24297/ijct.v25i.9795
- Sep 29, 2025
- INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY
- Ravelonahina An Drianjaka Hasina + 2 more
This study investigates how supervised and unsupervised machine learning algorithms can complement traditional statistical methods in the analysis of social survey data. Social science datasets are typically small, noisy, and heterogeneous, which makes robustness and interpretability more important than computational efficiency. Using data from a 2024 survey on the employability of management graduates in Antananarivo, the study compares machine learning approaches with classical multivariate techniques. The objectives are to provide a statistical description of a social reality and to establish criteria for selecting algorithms suited to small-sample contexts. The methodological framework integrates statistical tools such as Chi-square tests, analysis of variance, and multiple regression with exploratory approaches including association rules and clustering. It also incorporates supervised models such as neural networks trained via gradient descent and its variants. Beyond these models, ensemble methods based on decision trees—bagging, random forests, and gradient boosting—are evaluated to highlight their relative strengths. Findings show that gradient boosting offers the most consistent predictive performance while remaining relatively simple to implement. This makes it particularly effective for analysing small and heterogeneous datasets, thereby providing practical value for applied social science research.
- Research Article
- 10.1007/s10531-025-03168-w
- Sep 29, 2025
- Biodiversity and Conservation
- Shahab Ud Din + 1 more
Unveiling patterns in wildlife conservation attitudes in Gilgit-Baltistan, Pakistan, using unsupervised machine learning algorithms
- Research Article
- 10.3389/fpubh.2025.1649400
- Sep 25, 2025
- Frontiers in Public Health
- Huayan Zuo + 11 more
ObjectivesThis study aims to investigate the efficacy of unsupervised machine learning algorithms, specifically the Gaussian Mixture Model (GMM), K-means clustering, and Otsu automatic threshold partitioning, in predicting sarcopenia based on computed tomography (CT) and magnetic resonance imaging (MRI) data.MethodsA retrospective analysis was conducted on a dataset comprising 191 patients diagnosed with sarcopenia and 327 control patients. Participants were randomly assigned to training and validation cohorts in a 6:4 ratio. The paravertebral muscles at the lumbar 3/4 intervertebral disc level were manually delineated as the region of interest (ROI) on non-enhanced CT and axial T2-weighted MRI images. Muscle and adipose tissues were automatically segmented from the ROI using GMM, K-means, and Otsu algorithms at the cohort level. Quantitative metrics such as mean, volume, and volume percentage were computed, and these parameters were compared between the sarcopenia and non-sarcopenia groups. Logistic regression analysis was employed to develop predictive models for sarcopenia, with model performance evaluated using the area under the curve (AUC). The stability of the models was assessed through five-fold cross-validation.ResultsThe study demonstrates that three unsupervised clustering algorithms utilizing CT data surpassed those employing MRI data. Notably, the CT-based Otsu model exhibited the highest predictive performance in both training and validation datasets, with AUC values of 0.986 and 0.958, respectively. This was followed by the CT-based GMM, which achieved AUC values of 0.990 and 0.903, and the K-means model, with AUC values of 0.727 and 0.772. Furthermore, the CT-based GMM model demonstrated superior stability upon five-fold cross-validation, yielding an average AUC of 0.990.ConclusionThe findings indicate that CT-based unsupervised machine learning models outperform their MRI-based counterparts, with the CT-based Otsu and GMM models showing exceptional efficacy in sarcopenia prediction, as evidenced by AUC values exceeding 0.95.
- Research Article
- 10.1039/d5an00649j
- Sep 22, 2025
- The Analyst
- Raven L Buckman Johnson + 2 more
Mass spectrometry imaging (MSI) has emerged as a powerful tool for spatial metabolomics, but untargeted data analysis has proven to be challenging. When combined with in vivo isotope labeling (MSIi), MSI provides insights into metabolic dynamics with high spatial resolution; however, the data analysis becomes even more complex. Although various tools exist for advanced MSI analyses, machine learning (ML) applications to MSIi have not been explored. In this study, we leverage Cardinal to process MSIi datasets of duckweeds labeled with either 13CO2 or D2O. We apply spatial shrunken centroid (SSC) segmentation, an unsupervised ML algorithm, to differentiate metabolite localizations and investigate isotope labeling of untargeted metabolites. In the SSC segmentation of three-day 13C-labeled duckweed dataset, five spatial segments were identified based on distinct lipid isotopologue distributions, in contrast to classification of only three tissue regions in previous manual analysis based on galactolipid isotopologues. Similarly, SSC segmentation of five-day D-labeled dataset revealed five spatial segments based on distinct metabolite and isotopologue profiles. Further, this untargeted segmentation analysis of MSIi dataset provided insights on tissue-specific relative flux of each metabolite by calculating the fraction of de novo biosynthesis in each segment. Overall, the application of unsupervised machine learning to MSIi datasets has proven to significantly reduce analysis time, increase throughput, and improve the clarity of spatial isotopologue distributions.
- Research Article
2
- 10.1016/j.stem.2025.07.010
- Sep 4, 2025
- Cell stem cell
- Michelle Griffin + 15 more
Multi-omic analysis reveals retinoic acid molecular drivers for dermal fibrosis and regenerative repair in the skin.