Abstract

Variable selection and penalized regression models in high-dimension settings have become an increasingly important topic in many disciplines. For instance, omics data are generated in biomedical researches that may be associated with survival of patients and suggest insights into disease dynamics to identify patients with worse prognosis and to improve the therapy. Analysis of high-dimensional time-to-event data in the presence of competing risks requires special modeling techniques. So far, some attempts have been made to variable selection in low- and high-dimension competing risk setting using partial likelihood-based procedures. In this paper, a weighted likelihood-based penalized approach is extended for direct variable selection under the subdistribution hazards model for high-dimensional competing risk data. The proposed method which considers a larger class of semiparametric regression models for the subdistribution allows for taking into account time-varying effects and is of particular importance, because the proportional hazards assumption may not be valid in general, especially in the high-dimension setting. Also, this model relaxes from the constraint of the ability to simultaneously model multiple cumulative incidence functions using the Fine and Gray approach. The performance/effectiveness of several penalties including minimax concave penalty (MCP); adaptive LASSO and smoothly clipped absolute deviation (SCAD) as well as their L2 counterparts were investigated through simulation studies in terms of sensitivity/specificity. The results revealed that sensitivity of all penalties were comparable, but the MCP and MCP-L2 penalties outperformed the other methods in term of selecting less noninformative variables. The practical use of the model was investigated through the analysis of genomic competing risk data obtained from patients with bladder cancer and six genes of CDC20, NCF2, SMARCAD1, RTN4, ETFDH, and SON were identified using all the methods and were significantly correlated with the subdistribution.

Highlights

  • The recent development of high-throughput biology provides powerful information about various phenotypic data including patients’ survival times

  • We propose a group variable selection via elastic net (ENET), smoothly clipped absolute deviation (SCAD)-L2, and minimax concave penalty (MCP)-L2

  • This study proposed a penalized weighted nonparametric likelihood-based approach for sparse variable selection in high-dimension competing risk data setting

Read more

Summary

Introduction

The recent development of high-throughput biology provides powerful information about various phenotypic data including patients’ survival times. By uncovering the relationship between time to an event such as cancer and the expression profiles, one hopes to achieve more accurate prognoses and improved treatment strategies [3]. This issue is challenging for two main reasons. The number of covariates in microarray gene expression analysis or DNA sequencing. Computational and Mathematical Methods in Medicine data obtained from next-generation sequencing technology commonly far exceeds sample size (p > >n). The availability and feasibility of standard analyses are severely affected by the high possibility of potential collinearity among different gene levels [2]

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.