Abstract
Variable selection and penalized regression models in high-dimension settings have become an increasingly important topic in many disciplines. For instance, omics data are generated in biomedical researches that may be associated with survival of patients and suggest insights into disease dynamics to identify patients with worse prognosis and to improve the therapy. Analysis of high-dimensional time-to-event data in the presence of competing risks requires special modeling techniques. So far, some attempts have been made to variable selection in low- and high-dimension competing risk setting using partial likelihood-based procedures. In this paper, a weighted likelihood-based penalized approach is extended for direct variable selection under the subdistribution hazards model for high-dimensional competing risk data. The proposed method which considers a larger class of semiparametric regression models for the subdistribution allows for taking into account time-varying effects and is of particular importance, because the proportional hazards assumption may not be valid in general, especially in the high-dimension setting. Also, this model relaxes from the constraint of the ability to simultaneously model multiple cumulative incidence functions using the Fine and Gray approach. The performance/effectiveness of several penalties including minimax concave penalty (MCP); adaptive LASSO and smoothly clipped absolute deviation (SCAD) as well as their L2 counterparts were investigated through simulation studies in terms of sensitivity/specificity. The results revealed that sensitivity of all penalties were comparable, but the MCP and MCP-L2 penalties outperformed the other methods in term of selecting less noninformative variables. The practical use of the model was investigated through the analysis of genomic competing risk data obtained from patients with bladder cancer and six genes of CDC20, NCF2, SMARCAD1, RTN4, ETFDH, and SON were identified using all the methods and were significantly correlated with the subdistribution.
Highlights
The recent development of high-throughput biology provides powerful information about various phenotypic data including patients’ survival times
We propose a group variable selection via elastic net (ENET), smoothly clipped absolute deviation (SCAD)-L2, and minimax concave penalty (MCP)-L2
This study proposed a penalized weighted nonparametric likelihood-based approach for sparse variable selection in high-dimension competing risk data setting
Summary
The recent development of high-throughput biology provides powerful information about various phenotypic data including patients’ survival times. By uncovering the relationship between time to an event such as cancer and the expression profiles, one hopes to achieve more accurate prognoses and improved treatment strategies [3]. This issue is challenging for two main reasons. The number of covariates in microarray gene expression analysis or DNA sequencing. Computational and Mathematical Methods in Medicine data obtained from next-generation sequencing technology commonly far exceeds sample size (p > >n). The availability and feasibility of standard analyses are severely affected by the high possibility of potential collinearity among different gene levels [2]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Computational and mathematical methods in medicine
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.