Abstract

The recent explosion of high-throughput technology has been accompanied by a corresponding rapid increase in the number of new statistical methods for developing prognostic and predictive signatures. Three commonly used feature selection techniques for time-to-event data: single gene testing (SGT), Elastic net and the Maximizing R Square Algorithm (MARSA) are evaluated on simulated datasets that vary in the sample size, the number of features and the correlation between features. The results of each method are summarized by reporting the sensitivity and the Area Under the Receiver Operating Characteristic Curve (AUC). The performance of each of these algorithms depends heavily on the sample size while the number of features entered in the analysis has a much more modest impact. The coefficients estimated utilizing SGT are biased towards the null when the genes are uncorrelated and away from the null when the genes are correlated. The Elastic Net algorithms perform better than MARSA and almost as well as the SGT when the features are correlated and about the same as MARSA when the features are uncorrelated.

Highlights

  • Discovering prognostic or predictive signatures is a worthwhile endeavor as it is well known that the effect of a treatment is largely heterogeneous

  • Regardless of the number of genes entered in the analysis, the Area Under the Receiver Operating Characteristic (AUC) is higher for n = 200 than for lower n, while the difference made by the number of genes entered in the analysis has a much more modest impact

  • This choice is considered realistic as False Discovery Rate (FDR), Maximizing R Square Algorithm (MARSA) and the penalized likelihood methods are typically applied to a subset of features, chosen through a marginal method as the unadjusted p-value of the single gene testing (SGT) method

Read more

Summary

Introduction

Discovering prognostic or predictive signatures is a worthwhile endeavor as it is well known that the effect of a treatment is largely heterogeneous. The medical research has witnessed a recent explosion of high-throughput technology, rendering the measurement of a large number of genetic features possible. New analytical techniques are constantly being developed to process. Sykes 12 and draw associations from this daunting amount of information. The rapid development of both aspects—the measurement and analysis of features— has made it difficult to determine the best analytical technique for finding a genetic signature

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.