Abstract

BackgroundMolecular markers based on gene expression profiles have been used in experimental and clinical settings to distinguish cancerous tumors in stage, grade, survival time, metastasis, and drug sensitivity. However, most significant gene markers are unstable (not reproducible) among data sets. We introduce a standardized method for representing cancer markers as 2-level hierarchical feature vectors, with a basic gene level as well as a second level of (more stable) pathway markers, for the purpose of discriminating cancer subtypes. This extends standard gene expression arrays with new pathway-level activation features obtained directly from off-the-shelf gene set enrichment algorithms such as GSEA. Such so-called pathway-based expression arrays are significantly more reproducible across datasets. Such reproducibility will be important for clinical usefulness of genomic markers, and augment currently accepted cancer classification protocols.ResultsThe present method produced more stable (reproducible) pathway-based markers for discriminating breast cancer metastasis and ovarian cancer survival time. Between two datasets for breast cancer metastasis, the intersection of standard significant gene biomarkers totaled 7.47% of selected genes, compared to 17.65% using pathway-based markers; the corresponding percentages for ovarian cancer datasets were 20.65% and 33.33% respectively. Three pathways, consisting of Type_1_diabetes mellitus, Cytokine-cytokine_receptor_interaction and Hedgehog_signaling (all previously implicated in cancer), are enriched in both the ovarian long survival and breast non-metastasis groups. In addition, integrating pathway and gene information, we identified five (ID4, ANXA4, CXCL9, MYLK, FBXL7) and six (SQLE, E2F1, PTTG1, TSTA3, BUB1B, MAD2L1) known cancer genes significant for ovarian and breast cancer respectively.ConclusionsStandardizing the analysis of genomic data in the process of cancer staging, classification and analysis is important as it has implications for both pre-clinical as well as clinical studies. The paradigm of diagnosis and prediction using pathway-based biomarkers as features can be an important part of the process of biomarker-based cancer analysis, and the resulting canonical (clinically reproducible) biomarkers can be important in standardizing genomic data. We expect that identification of such canonical biomarkers will improve clinical utility of high-throughput datasets for diagnostic and prognostic applications.ReviewersThis article was reviewed by John McDonald (nominated by I. King Jordon), Eugene Koonin, Nathan Bowen (nominated by I. King Jordon), and Ekaterina Kotelnikova (nominated by Mikhail Gelfand).

Highlights

  • Molecular markers based on gene expression profiles have been used in experimental and clinical settings to distinguish cancerous tumors in stage, grade, survival time, metastasis, and drug sensitivity

  • We introduce one approach for developing leading edge gene features determined by Gene set enrichment analysis (GSEA), and an additional two approaches for developing pathway features using GSEA and support vector machine (SVM)

  • We separately analyzed an additional data set of gene expression biomarkers from 119 samples taken in a Duke University study [2] of advanced ovarian cancers

Read more

Summary

Introduction

Molecular markers based on gene expression profiles have been used in experimental and clinical settings to distinguish cancerous tumors in stage, grade, survival time, metastasis, and drug sensitivity. We introduce a standardized method for representing cancer markers as 2-level hierarchical feature vectors, with a basic gene level as well as a second level of (more stable) pathway markers, for the purpose of discriminating cancer subtypes This extends standard gene expression arrays with new pathway-level activation features obtained directly from off-the-shelf gene set enrichment algorithms such as GSEA. There is a well-recognized need for canonical biomarkers, which will make it possible to record marker-based data interchangeably among different laboratories, using well-recognized quantitative features to encode cancer and other phenotypes This need is related to future improvements in standardized diagnosis and prognosis regimes which will incorporate genomic cancer information as a matter augmenting current cancer classification protocols. Pathway-based aggregation of gene information is one among a number of ways of incorporating gene-gene relationships in augmenting predictive performance, and protein-protein interaction-based as well as coexpressionbased aggregations have been shown in various contexts to improve performance of classification methods [5]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.