Abstract

BackgroundModeling thousands of markers simultaneously has been of great interest in testing association between genetic biomarkers and disease or disease-related quantitative traits. Recently, an expectation-maximization (EM) approach to Bayesian variable selection (EMVS) facilitating the Bayesian computation was developed for continuous or binary outcome using a fast EM algorithm. However, it is not suitable to the analyses of time-to-event outcome in many public databases such as The Cancer Genome Atlas (TCGA).ResultsWe extended the EMVS to high-dimensional parametric survival regression framework (SurvEMVS). A variant of cyclic coordinate descent (CCD) algorithm was used for efficient iteration in M-step, and the extended Bayesian information criteria (EBIC) was employed to make choice on hyperparameter tuning. We evaluated the performance of SurvEMVS using numeric simulations and illustrated the effectiveness on two real datasets. The results of numerical simulations and two real data analyses show the well performance of SurvEMVS in aspects of accuracy and computation. Some potential markers associated with survival of lung or stomach cancer were identified.ConclusionsThese results suggest that our model is effective and can cope with high-dimensional omics data.

Highlights

  • With the development of high-throughput sequence technology, large-scale omics data are generated rapidly for discovering new biomarkers [1, 2]. The public databases such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) provide great opportunities to understand complex diseases comprehensively on a molecular level [3, 4] and subsequently facilitate growing demanding statistical approaches designed to cope with these large-scale data [5]

  • We considered a metric, namely extended Bayesian Information Criteria (EBIC), which was utilized for model selection at first [39]

  • By analogy with least absolute shrinkage and selection operator (LASSO) solution path plot that shows the estimates change with an increasing penalty parameter, here we want to investigate the impact of parameters tuning for υ0

Read more

Summary

Introduction

With the development of high-throughput sequence technology, large-scale omics data are generated rapidly for discovering new biomarkers [1, 2] The public databases such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) provide great opportunities to understand complex diseases comprehensively on a molecular level [3, 4] and subsequently facilitate growing demanding statistical approaches designed to cope with these large-scale data [5]. Ročková and George [22] proposed EM variable selection (EMVS) for continuous outcomes to rapidly identify promising high posterior models and parameter estimates. An expectation-maximization (EM) approach to Bayesian variable selection (EMVS) facilitating the Bayesian computation was developed for continuous or binary outcome using a fast EM algorithm. It is not suitable to the analyses of time-to-event outcome in many public databases such as The Cancer Genome Atlas (TCGA)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call