Abstract

BackgroundAn important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease diagnosis and prognosis. Thus it is of interest to develop efficient statistical methods that can simultaneously identify important biomarkers from such high-throughput genomic data and construct appropriate classification rules. It is also of interest to develop methods for evaluation of classification performance and ranking of identified biomarkers.ResultsThe ROC (receiver operating characteristic) technique has been widely used in disease classification with low dimensional biomarkers. Compared with the empirical ROC approach, the binormal ROC is computationally more affordable and robust in small sample size cases. We propose using the binormal AUC (area under the ROC curve) as the objective function for two-sample classification, and the scaled threshold gradient directed regularization method for regularized estimation and biomarker selection. Tuning parameter selection is based on V-fold cross validation. We develop Monte Carlo based methods for evaluating the stability of individual biomarkers and overall prediction performance. Extensive simulation studies show that the proposed approach can generate parsimonious models with excellent classification and prediction performance, under most simulated scenarios including model mis-specification. Application of the method to two cancer studies shows that the identified genes are reasonably stable with satisfactory prediction performance and biologically sound implications. The overall classification performance is satisfactory, with small classification errors and large AUCs.ConclusionIn comparison to existing methods, the proposed approach is computationally more affordable without losing the optimality possessed by the standard ROC method.

Highlights

  • An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease diagnosis and prognosis

  • We proposed an approach for biomarker selection and classification with microarray data by optimizing the binormal area under the ROC curve (AUC)

  • The receiver operating characteristic (ROC) method has been successfully used for disease classification using low- dimensional biomarkers

Read more

Summary

Introduction

An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease diagnosis and prognosis. It is of interest to develop efficient statistical methods that can simultaneously identify important biomarkers from such high-throughput genomic data and construct appropriate classification rules. It is of interest to develop methods for evaluation of classification performance and ranking of identified biomarkers. Microarray experiments that monitor gene expression profiles associated with different disease phenotypes have become commonplace in biomedical research. By employing standard methods directly, we usually obtain estimates that are not "regular", i.e., estimates are not unique or ill-behaved. Regularization, through which we achieve unique and well-behaved estimates, is usually needed. Regularization can be achieved via model reduction or variable selection methods

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call