Colorectal cancer is a common condition with an uncommon burden of disease, heterogeneity in manifestation, and no definitive treatment in the advanced stages. Renewed efforts to unravel the genetic drivers of colorectal cancer progression are paramount. Early-stage detection contributes to the success of cancer therapy and increases the likelihood of a favorable prognosis. Here, we have executed a comprehensive computational workflow aimed at uncovering the discrete stagewise genomic drivers of colorectal cancer progression. Using the TCGA COADREAD expression data and clinical metadata, we constructed stage-specific linear models as well as contrast models to identify stage-salient differentially expressed genes. Stage-salient differentially expressed genes with a significant monotone trend of expression across the stages were identified as progression-significant biomarkers. The stage-salient genes were benchmarked using normals-augmented dataset, and cross-referenced with existing knowledge. The candidate biomarkers were used to construct the feature space for learning an optimal model for the digital screening of early-stage colorectal cancers. The candidate biomarkers were also examined for constructing a prognostic model based on survival analysis. Among the biomarkers identified are: CRLF1, CALB2, STAC2, UCHL1, KCNG1 (stage-I salient), KLHL34, LPHN3, GREM2, ADCY5, PLAC2, DMRT3 (stage-II salient), PIGR, HABP2, SLC26A9 (stage-III salient), GABRD, DKK1, DLX3, CST6, HOTAIR (stage-IV salient), and CDH3, KRT80, AADACL2, OTOP2, FAM135B, HSP90AB1 (top linear model genes). In particular the study yielded 31 genes that are progression-significant such as ESM1, DKK1, SPDYC, IGFBP1, BIRC7, NKD1, CXCL13, VGLL1, PLAC1, SPERT, UPK2, and interestingly three members of the LY6G6 family. Significant monotonic linear model genes included HIGD1A, ACADS, PEX26, and SPIB. A feature space of just seven biomarkers, namely ESM1, DHRS7C, OTOP3, AADACL2, LPHN3, GABRD, and LPAR1, was sufficient to optimize a RandomForest model that achieved > 98% balanced accuracy (and performant recall) of cancer vs. normal on external validation. Design of an optimal multivariate model based on survival analysis yielded a prognostic panel of three stage-IV salient genes, namely HOTAIR, GABRD, and DKK1. Based on the above sparse signatures, we have developed COADREADx, a web-server for potentially assisting colorectal cancer screening and patient risk stratification. COADREADx provides uncertainty measures for its predictions and needs clinical validation. It has been deployed for experimental non-commercial use at: https://apalanialab.shinyapps.io/coadreadx/.
Read full abstract