Abstract

BackgroundGraphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Bühlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models.MethodsWe analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index.We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer.ResultsBootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms.ConclusionsBootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes.

Highlights

  • Graphical models were identified as a promising new approach to modeling high-dimensional clinical data

  • Graphical models were identified as a promising new approach to modeling clinical data [2], and thereby the systems approach to health and disease

  • Graphical models [3] provide a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables

Read more

Summary

Methods

Data generation This section presents an approach to simulate high-dimensional binary data from a given distribution and dimension by analyzing the results on a data set with known dependence structure. Estimate the coefficients βLASSO in local penalized logistic regression models using each variable as outcome and the remainder as predictors for each X(i) corresponding to an optimal penalty term t. Define the set of conditional relationships (the edge set) E as: E = {(a, b)|a ∈ ne∗b ∨ b ∈ ne∗a} Another method to construct binary graphical models is based on the Bolasso algorithm which takes advantage of bootstrap aggregating. Called ‘bagging’, generates multiple versions of a predictor, e.g. a coefficient in a generalized linear model, or classifier It constitutes a simple and general approach to improve an unstable estimator θ(X) with X being a given data set. In order to estimate model performances dependent on the parameters πcut, B and l and the interaction between πcut and l we calculated generalized linear models with either SHD or J as outcome variable

Results
Conclusions
Background
Result
Discussion
Conclusion
Breiman L
29. R Development Core Team
31. World Health Organization
34. Goeman JJ
38. Goeman J: Penalized
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call