Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Dirichlet Process Mixture Model
  • Dirichlet Process Mixture Model
  • Dirichlet Process
  • Dirichlet Process
  • Mixture Model
  • Mixture Model

Articles published on Dirichlet process mixture

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
771 Search results
Sort by
Recency
  • Research Article
  • 10.3390/inventions11020042
Risk Assessment of Distribution Network Based on Dirichlet Process Mixture Model and the Cumulant Method
  • Apr 21, 2026
  • Inventions
  • Yuxuan Huang + 6 more

To address the increased operational risk in distribution network caused by the grid integration of distributed wind power, a distribution network risk assessment method that combines a Dirichlet process mixture model (DPMM) with the cumulant method (CM) is proposed, to achieve effective quantification of operational risk. Firstly, a DPMM is employed to cluster wind power output data, and adaptive kernel density estimation is introduced to construct a probabilistic model of wind power output, thereby improving local fitting accuracy. Secondly, uncertainties arising from wind generation and load are considered, and a probabilistic power flow model for the distribution network is established based on the CM and the Gram–Charlier series expansion, in order to obtain the probability distributions of state variables and branch power flows. Then, distribution entropy theory is introduced to quantify the severity of limit violations for state variables such as voltage and power, so that operational risk assessment is enabled. Finally, simulations are conducted on a modified IEEE 34-bus distribution test system, and the results demonstrate the effectiveness of the proposed method.

  • Research Article
  • 10.1080/10618600.2026.2648594
Bayesian Adaptive Sparse Copula
  • Mar 18, 2026
  • Journal of Computational and Graphical Statistics
  • Martin Burda + 1 more

Bayesian nonparametric density estimation procedures are typically based on single-scale priors, such as Dirichlet process mixtures. Alternative multiscale density priors built on decision trees have many well-known advantages, including the ability to characterize abrupt local changes and to provide an estimate with a desired level of resolution. Despite their theoretical appeal, multiscale methods have typically been developed in the literature as univariate. Their multivariate versions are generally costly to implement in applications due to rapidly increasing number of mixture components. We propose a random Bernstein polynomial prior on the unit hypercube of arbitrary dimension with a spike-and-slab shrinkage structure. The prior induces posterior sparsity of the multiscale decision tree, alleviating the curse of dimensionality. We embed the proposed model in the form of a copula link function along with nonparametric marginals in a composite prior over general spaces of densities. We provide conditions for posterior consistency under the weak topology and assess the finite-sample properties in a simulation study. We further illustrate the practical use of the model in an application to forecasting the Value at Risk and Expected Shortfall of a financial portfolio in a scenario where sampling from the non-sparse posterior would be infeasible. Supplemental materials for this article are available online.

  • Research Article
  • 10.1088/2632-2153/ae45ed
Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models
  • Mar 3, 2026
  • Machine Learning: Science and Technology
  • Aurora Grefsrud + 2 more

Abstract Rigorous statistical methods, including parameter estimation with accompanying uncertainties, underpin the validity of scientific discovery, especially in the natural sciences. With increasingly complex data models such as deep learning techniques, uncertainty quantification has become exceedingly difficult and a plethora of techniques have been proposed. In this case study, we use the unifying framework of approximate Bayesian inference combined with empirical tests on carefully created synthetic classification datasets to investigate qualitative properties of six different probabilistic machine learning algorithms for class probability and uncertainty estimation: (i) a neural network ensemble (NNE), (ii) NNE with conflictual loss, (iii) evidential deep learning, (iv) a single neural network with Monte Carlo dropout, (v) Gaussian process classification and (vi) a Dirichlet process mixture model. We check if the algorithms produce uncertainty estimates which reflect commonly desired properties, such as being well calibrated and exhibiting an increase in uncertainty for out-of-distribution (OOD) data points. Our results indicate that all algorithms show reasonably good calibration performance on our synthetic test sets, but none of the deep learning based algorithms provide uncertainties that consistently reflect lack of experimental evidence for OOD data points. We hope our study may serve as a clarifying example for researchers that are using or developing methods of uncertainty estimation for scientific data-driven modeling and analysis.

  • Research Article
  • Cite Count Icon 2
  • 10.1214/24-ba1463
Bayesian Nonparametric Modeling of Latent Partitions via Stirling-Gamma Priors
  • Mar 1, 2026
  • Bayesian Analysis
  • Alessandro Zito + 2 more

Dirichlet process mixtures are particularly sensitive to the value of the precision parameter controlling the behavior of the latent partition. Randomization of the precision through a prior distribution is a common solution, which leads to more robust inferential procedures. However, existing prior choices do not allow for transparent elicitation, due to the lack of analytical results. We introduce and investigate a novel prior for the Dirichlet process precision, the Stirling-gamma distribution. We study the distributional properties of the induced random partition, with an emphasis on the number of clusters. Our theoretical investigation clarifies the reasons of the improved robustness properties of the proposed prior. Moreover, we show that, under specific choices of its hyperparameters, the Stirling-gamma distribution is conjugate to the random partition of a Dirichlet process. We illustrate with an ecological application the usefulness of our approach for the detection of communities of ant workers.

  • Research Article
  • 10.1016/j.ymssp.2026.114020
Hierarchical Bayesian model updating using Dirichlet process mixtures for structural damage localization
  • Mar 1, 2026
  • Mechanical Systems and Signal Processing
  • Taro Yaoyama + 2 more

Hierarchical Bayesian model updating using Dirichlet process mixtures for structural damage localization

  • Research Article
  • 10.1016/j.neucom.2025.132374
Dirichlet process mixture mechanism with extended stochastic variational inference: Bayesian adversarial learning for IoT intrusion detection
  • Mar 1, 2026
  • Neurocomputing
  • Wenda He + 5 more

Dirichlet process mixture mechanism with extended stochastic variational inference: Bayesian adversarial learning for IoT intrusion detection

  • Research Article
  • 10.1002/qre.70182
Residual Lifetime Prediction for Heterogeneous Degradation Data by Bayesian Semi‐Parametric Method
  • Feb 17, 2026
  • Quality and Reliability Engineering International
  • Barin Karmakar + 1 more

ABSTRACT Degradation data are considered for assessing reliability in highly reliable systems. The usual assumption is that degradation units come from a homogeneous population. But in presence of high variability in the manufacturing process, this assumption is not true in general; that is different subpopulations are involved in the study. Predicting residual lifetime of a functioning unit is a major challenge in the degradation modeling, especially in heterogeneous environment. To account for heterogeneous degradation data, we have proposed a Bayesian semi‐parametric approach to relax the conventional modeling assumptions. We model the degradation path using the Dirichlet process mixture of normal distributions. Based on the samples obtained from posterior distribution of model parameters, we obtain residual lifetime distribution for individual unit. Transformation‐based MCMC technique is used for simulating values from the derived residual lifetime distribution for prediction of residual lifetime. A simulation study is undertaken to check performance of the proposed semi‐parametric model compared with parametric model. Fatigue Crack Size data is analyzed to illustrate the proposed methodology.

  • PDF Download Icon
  • Research Article
  • 10.1007/s10479-026-07039-7
Actuarial Bayesian nonparametric regression modelling for survival data
  • Jan 29, 2026
  • Annals of Operations Research
  • Francesco Ungolo + 2 more

This paper introduces a flexible regression model for the statistical analysis of the individual mortality profile of pension scheme members. The model incorporates individual-specific random effects, which follow a discrete distribution drawn from a Dirichlet Process, enhancing its adaptability to complex data structures. This results in a Dependent Dirichlet Process mixture model in the spirit of De Iorio et al. (Biometrics 65(3):762–771. https://doi.org/10.1111/j.1541-0420.2008.01166.x , 2009), which accommodates nonmonotonic relationships between covariates and the regression function. The application of the model is illustrated through the analysis of a mid-sized UK pension scheme dataset. The model shows the ability to capture complex features of the data, such as the late life mortality deceleration at no cost in terms of model parsimony, and an improved out-of-sample performance compared with standard parametric alternatives, making it particularly suitable for actuarial modelling applications.

  • Research Article
  • 10.1177/09622802251414594
Cluster analysis for longitudinal data and its application in the detection of adiposity trajectories.
  • Jan 20, 2026
  • Statistical methods in medical research
  • Asael Fabian Martínez + 2 more

The identification of latent profile trajectories in longitudinal studies represents an important challenge for specialists since they could provide insights to better understand their problem of interest. The majority of the statistical methodologies for cluster analysis for longitudinal data are based on growth curve or mixed-effects models, and often incorporate covariates for a better adjustment. In particular, for Bayesian nonparametric methods, Dirichlet process mixture models are widely used together. We propose a clustering methodology for longitudinal data based on mixture models generated by a discrete random probability measure whose weights are decreasingly ordered by construction. Additionally, data is modeled without making use of covariates and assuming independence across time for individual measurements. Our approach also provides a straightforward procedure to merge some estimated groups, since it could happen that there are many of them, to be easily explained by experts. Our results suggest that, at least for a first analysis, this framework is enough to effectively detect groups in the data; further exploration of each group could incorporate extra information. We apply our methodology for detecting adiposity trajectories in Mexican children in a secondary analysis of the "Prenatal Omega-3 fatty acid Supplementation and Child Growth and Development" study (POSGRAD) cohort.

  • Research Article
  • 10.1002/sim.70360
Nonparametric Bayesian Adjustment of Unmeasured Confounders in Cox Proportional Hazards Models
  • Jan 1, 2026
  • Statistics in Medicine
  • Shunichiro Orihara + 5 more

ABSTRACTUnmeasured confounders pose a major challenge in accurately estimating causal effects in observational studies. To address this issue when estimating hazard ratios (HRs) using Cox proportional hazards models, several methods, including instrumental variables (IVs) approaches, have been proposed. However, these methods often face limitations, such as weak IV problems and restrictive assumptions regarding unmeasured confounder distributions. In this study, we introduce a novel nonparametric Bayesian procedure that provides accurate HR estimates while addressing these limitations. A key assumption of our approach is that unmeasured confounders exhibit a cluster structure. Under this assumption, we integrate two remarkable Bayesian techniques, the Dirichlet process mixture (DPM) and general Bayes (GB), to simultaneously (1) detect latent clusters based on the likelihood of exposure and outcome variables and (2) estimate HRs using the likelihood constructed within each cluster. Notably, leveraging DPM, our procedure eliminates the need for IVs by identifying unmeasured confounders under an alternative condition. Additionally, GB techniques remove the need for explicit modeling of the baseline hazard function, distinguishing our procedure from traditional Bayesian approaches. Simulation experiments demonstrate that the proposed Bayesian procedure outperforms existing methods in some performance metrics. Moreover, it achieves statistical efficiency comparable to the efficient estimator while accurately identifying cluster structures. These features highlight its ability to overcome challenges associated with traditional IV approaches for time‐to‐event data.

  • Research Article
  • 10.1109/tste.2026.3668241
Day-ahead Scheduling of Hydrogen-based Chemical Industry Park: A Bayesian Nonparametric Two-sided Chance Constraint Method
  • Jan 1, 2026
  • IEEE Transactions on Sustainable Energy
  • Jiahe Li + 4 more

Uncertainties introduced by renewable energy pose significant challenges to the scheduling of chemical industry parks that integrate an electricity-heat-hydrogen microgrid at the supply level and ammonia-based process chemical industry loads (APCILs) at the demand level. In this paper, a day-ahead scheduling framework based on Bayesian nonparametric two-sided chance constraint (BNTCC) is proposed to address these challenges. First, a refined operation model for electricity-heat-hydrogen microgrids and an integrated energy-material model for APCILs are developed, and the security constraints of hydrogen equipment and chemical production processes are incorporated. Second, a Dirichlet process mixture model (DPMM) is used to construct a Gaussian mixture (GM) distribution from historical uncertainty data to characterize renewable energy uncertainties. On the basis of this GM model, a two-sided chance constraint (TCC) scheduling method is proposed for the scheduling of chemical industry parks to reduce conservatism while ensuring that operational limits are jointly satisfied with a specified confidence level. Finally, to solve the TCC problem efficiently, a second-order cone programming (SOCP) formulation is derived using a piecewise linear (PWL) approximation. An additional algorithm is introduced to accelerate the computation by optimally selecting the PWL segments. Case studies illustrate the effectiveness of the proposed method for the day-ahead scheduling of chemical industry parks. Compared with state-of-the-art benchmark methods, the proposed method demonstrates superior cost effectiveness and computational efficiency.

  • Research Article
  • 10.1155/stc/6796260
Automated Operational Modal Analysis of Bridge Structure Based on Infinite Dirichlet Process Mixture Model Without Prior Threshold
  • Jan 1, 2026
  • Structural Control and Health Monitoring
  • Deshan Shan + 2 more

As one of the vital steps for structural health monitoring, automated operational modal analysis (AOMA) without manual intervention remains one of the challenging problems because it requires processing large datasets and involves many user‐defined thresholds. Combining the covariance‐driven stochastic subspace identification (Cov‐SSI) with the Dirichlet process mixture model (DPMM), a novel AOMA is proposed to cluster the stabilization diagram (SD) for automatically identifying structural modal parameters without prior thresholds. Based on the current Chinese specifications, the physical property of the mode shapes, and the uncertainty of the frequencies, the hard and soft validation criteria are determined to cleanse the initial SD derived from the Cov‐SSI algorithm. The clustering datasets with two‐dimensional clustering features are subsequently constructed by the frequency and damping ratio information included in the stable poles of the cleansed SD. Then, the DPMM, which optimizes simultaneously the cluster count and results, is incorporated into the automatic clustering process of the cleaned SD. The DPMM is iteratively solved using the collapsed Gibbs sampling method, and based on this, a novel AOMA approach for bridge structures is proposed, which requires only one‐time clustering operation. Moreover, the impact of the time lag in Cov‐SSI and hyperparameter α in the Dirichlet process on the automated clustering is also discussed. The proposed AOMA based on DPMM is initially validated using the measured data from the Dowling Hall Footbridge, which features two medium‐span steel truss girders. Subsequently, this method is applied to a practical long‐span flexible suspension bridge with a main span of 660 m. Validation and practical application results indicate that the proposed algorithm can accurately, efficiently, and automatically identify the modal parameters of the bridge structures with dense modes.

  • Research Article
  • 10.1002/sim.70326
A Bayesian Parametric and Nonparametric Approach for the Imputation of Multivariate Left-Censored Data Due to Limit of Detection.
  • Nov 27, 2025
  • Statistics in medicine
  • Federico L Perlino + 3 more

Left-censored observations due to limits of detection and/or quantification are common in clinical and epidemiologic research when continuous predictors are assessed from human specimens. In these settings, values below a certain threshold are not detectable in laboratory analysis and are reported as missing in the dataset. Classical imputation approaches have mostly relied on imputing the same number for all non-detected samples, thus compromising the continuous nature of the censored variables and affecting their variability and potential inclusion in regression modeling. Continuous imputations have been presented, but generally focusing on a single variable at the time. It is common, moreover, for the same human specimen to be used for the quantification of several biomarkers or exposures simultaneously, thus resulting in a complex set of multivariate and possibly correlated left-censored observations. To the best of our knowledge, there is no established framework that flexibly accounts for the real-world complexity of these data. We propose a Bayesian multiple imputation (MI) approach that relies on the introduction of multivariate latent variables to handle multivariate left-censored data. We present a general framework, accommodating both a parametric approach, assuming multivariate normality of the data, and a nonparametric approach, modeling observations by means of a location Dirichlet process mixture of multivariate normal kernels. Both approaches are implemented through a Gibbs sampling scheme. The performances of our approach are investigated with a simulation study based on environmental exposures, and illustrated by analyzing a real dataset on cardiovascular biomarkers.

  • Research Article
  • Cite Count Icon 1
  • 10.1080/07474938.2025.2581289
Attenuation bias vs selection bias: a multi-outcome three-stage model
  • Nov 20, 2025
  • Econometric Reviews
  • Andrés Ramírez–Hassan + 1 more

. We propose a Bayesian inferential framework for a multi-outcome endogenous three-stage model that accounts for incidental truncation in outcomes (intensive margin), selection into participation (extensive margin), and access restrictions. Simulation exercises assessing finite-sample properties under various misspecification settings suggest that incorporating access restrictions and unobserved correlations is crucial. In particular, access restrictions play a critical role, as failing to account for them may introduce measurement error when correcting for selection bias. This suggests a potential tension between attenuation and selection biases. We apply our framework to two novel datasets on credit and utility demand. We extend our parametric specification to a semi-parametric one in the latter application, modeling stochastic errors using a Dirichlet process mixture. The credit demand application suggests that better socioeconomic conditions increase the probability of using credit cards but decrease the likelihood of taking bank loans. In addition, women are more likely to use credit than men, but men tend to borrow larger amounts. The utility application highlights the importance of urban areas in increasing the probability of access to piped utilities, water and gas are inelastic goods, whereas electricity is elastic.

  • Research Article
  • 10.1016/j.neucom.2025.131119
A distributed inference algorithm for Dirichlet process mixture models with exponential family components
  • Nov 1, 2025
  • Neurocomputing
  • Reda Khoufache + 4 more

International audience

  • Research Article
  • Cite Count Icon 1
  • 10.1080/10705511.2025.2563187
Class Selection in Growth Mixture Models: Comparing Information Criteria to Nonparametric and Parametric Bayesian Approaches
  • Oct 18, 2025
  • Structural Equation Modeling: A Multidisciplinary Journal
  • Sarah Depaoli + 3 more

Selecting the number of latent classes is a critical yet challenging aspect of latent growth mixture modeling (LGMM), with implications for model validity and substantive interpretation. Researchers commonly rely on information criteria to compare models with different numbers of classes, but these methods can be inconsistent, especially when class separation is poor or class sizes are unequal. This study evaluates two alternative Bayesian approaches: (1) the Dirichlet process mixture (DPM) model, a nonparametric method, and (2) the mixture of finite mixtures (MFM) model, a parametric method. Both impose a prior on the number of classes and estimate that number from the data. While the DPM model is theoretically appealing, previous research has found it tends to over-extract small classes. The MFM model, in contrast, offers a more reliable alternative by explicitly modeling the number of classes as a finite random variable. We compare these techniques to traditional information criteria (AIC, BIC, AICc, and aBIC) across varying conditions of sample size, class structure, separation, and indicator reliability. Simulation results highlight key performance differences, and we provide practical guidance for researchers selecting among class number determination methods. Illustrative R code is provided as online supplemental material.

  • Research Article
  • 10.1080/10485252.2025.2576122
Nonparametric Bayesian latent class model for longitudinal zero-inflated count data
  • Oct 17, 2025
  • Journal of Nonparametric Statistics
  • Yaeji Lim + 2 more

This paper introduces a nonparametric Bayesian latent class model tailored to longitudinal count data with an excess of zeros. By embedding zero-inflation mechanisms and allowing for an unbounded number of mixture components, the proposed approach effectively captures heterogeneous subpopulations while accounting for overdispersion. Specifically, an extended normalised gamma process prior links class membership probabilities to relevant predictors, enabling subjects with similar covariate profiles to form latent classes that capture distinct underlying patterns. In comprehensive simulations, the proposed model demonstrates consistently superior predictive performance and lower misclassification rates compared to competing Poisson, zero-inflated Poisson, and Dirichlet process mixture approaches, underscoring its flexibility and accuracy in modelling latent structures for count data. Empirical validation using longitudinal dental caries data from the Iowa Fluoride Study further confirms that the proposed model outperforms well-known competitors. These findings highlight the importance of integrating flexible mixture modelling with explicit zero-inflation components to address both structural zeros and inherent variability in heterogeneous populations.

  • Research Article
  • Cite Count Icon 1
  • 10.1080/01621459.2025.2544366
A Bayesian Nonparametric Approach to Mediation and Spillover Effects with Multiple Mediators in Cluster-Randomized Trials
  • Oct 7, 2025
  • Journal of the American Statistical Association
  • Yuki Ohnishi + 1 more

Cluster randomized trials (CRTs) with multiple unstructured mediators present significant methodological challenges for causal inference due to within-cluster correlation, interference among units, and the complexity introduced by multiple mediators. Existing causal mediation methods often fall short in simultaneously addressing these complexities, particularly in disentangling mediator-specific effects under interference that are central to studying complex mechanisms. To address this gap, we propose new causal estimands for spillover mediation effects that differentiate the roles of each individual’s own mediator and the spillover effects resulting from interactions among individuals within the same cluster. We establish identification results for each estimand and, to flexibly model the complex data structures inherent in CRTs, we develop a new Bayesian nonparametric prior—the Nested Dependent Dirichlet Process Mixture—designed to flexibly capture the outcome and mediator surfaces at different levels. We conduct extensive simulations across various scenarios to evaluate the frequentist performance of our methods, compare them with a Bayesian parametric counterpart and illustrate our new methods in an analysis of a completed CRT. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

  • Research Article
  • Cite Count Icon 2
  • 10.1080/09540091.2025.2565163
Innovative synthetic EHR data generation: diffusion models for enhanced privacy and clinical utility in multimorbidity clustering
  • Oct 6, 2025
  • Connection Science
  • Francis John Kita + 2 more

The increasing use of electronic health records (EHRs) in medical research and AI-driven healthcare necessitates high-fidelity synthetic data that balances patient privacy with statistical and clinical utility. Traditional generative models, such as generative adversarial networks (GANs) and variational autoencoders (VAEs), struggle with mode collapse, limited sample diversity, and difficulties in modelling complex dependencies in high-dimensional tabular data. This study introduces a diffusion model-based approach for generating synthetic EHR data and evaluates its utility in clustering multimorbidity patterns using Dirichlet process mixture models (DPMMs). Denoising diffusion probabilistic models (DDPMs) iteratively refine noise through a structured denoising process, producing diverse, high-fidelity synthetic records. The DPMM framework, a Bayesian nonparametric clustering method, dynamically determines the number of clusters, effectively handling heterogeneous, imbalanced datasets. Model evaluation incorporates statistical similarity measures, feature correlation analysis, privacy risk assessments, and predictive performance metrics. Results demonstrate that DDPM-generated data surpasses GANs and VAEs in fidelity (Jensen–Shannon divergence (JSD) = 0.020, Pearson pairwise correlation (PPC) = 0.94), and privacy preservation (membership inference attack (MIA) Risk = 0.25). DPMM clustering reveals clinically meaningful disease patterns, outperforming traditional clustering models. These findings highlight the potential of diffusion models for privacy-preserving synthetic EHR generation and robust multimorbidity clustering in healthcare analytics.

  • Research Article
  • 10.1109/jbhi.2025.3567944
Scalable Bayesian Nonparametric Method for Clinical Risk Prediction Using Large-Scale Data From Heterogeneous Populations.
  • Oct 1, 2025
  • IEEE journal of biomedical and health informatics
  • Ning Dong + 2 more

While analyzing large clinical datasets allows for the identification of complex patterns to achieve increased risk prediction accuracy, it also presents challenges for existing risk modeling techniques due to patient heterogeneity and the ever-evolving volume and distributions of data. Bayesian nonparametric methods, such as the Dirichlet Process Mixture Model (DPMM), offer a promising solution for modeling data with mixed and overlapping distributions. However, the approach is computationally prohibitive when applied to large datasets, which greatly limits practical applications. In this study, we propose a scalable framework for efficiently constructing DPMMs from large clinical datasets. To improve computational efficiency, we divide the full dataset into smaller subsets and learn DPMMs within individual sets. Additionally, we adopt a recentered pseudo-barycenter to approximate the posterior density of the full dataset and design a new algorithm to generate a consistent clustering rule from the subset posteriors with unequal numbers of components. The method was validated through a simulation study and a case study predicting the survival of heart failure patients post-left ventricular assist device implantation. The results demonstrated improved accuracy compared to benchmark models such as the Cox proportional hazards model and random survival forests. Our modeling framework adaptively clusters patients with distinct risk profiles into subgroups and predicts their probabilities of developing adverse events from overlapping posterior mixtures, providing an effective approach for addressing patient heterogeneity and enhancing risk prediction accuracy.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers