L1penalized continuation ratio models for ordinal response prediction using high‐dimensional datasets

K J Archer,A A A Williams

doi:10.1002/sim.4484

Abstract

Health status and outcomes are frequently measured on an ordinal scale. For high-throughput genomic datasets, the common approach to analyzing ordinal response data has been to break the problem into one or more dichotomous response analyses. This dichotomous response approach does not make use of all available data and therefore leads to loss of power and increases the number of type I errors. Herein we describe an innovative frequentist approach that combines two statistical techniques, L(1) penalization and continuation ratio models, for modeling an ordinal response using gene expression microarray data. We conducted a simulation study to assess the performance of two computational approaches and two model selection criteria for fitting frequentist L(1) penalized continuation ratio models. Moreover, we empirically compared the approaches using three application datasets, each of which seeks to classify an ordinal class using microarray gene expression data as the predictor variables. We conclude that the L(1) penalized constrained continuation ratio model is a useful approach for modeling an ordinal response for datasets where the number of covariates (p) exceeds the sample size (n) and the decision of whether to use Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) for selecting the final model should depend upon the similarities between the pathologies underlying the disease states to be classified.

Full Text