Abstract

BackgroundWith advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues.ResultsTo facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. This is followed by cross-validation and a comparison of the results of sensitive and resistance groups to obtain the final list of candidate biomarkers. We applied this method to the ovarian cancer data from the cancer genome atlas. The final result contains biologically relevant genes, such as COL11A1, which has been reported as a cis-platinum resistant biomarker for epithelial ovarian carcinoma in several recent studies.ConclusionsThe described method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The results suggest that the unbiased data driven computational method can identify biologically relevant candidate biomarkers. It can be utilized in a wide range of applications that compare two conditions with highly heterogeneous datasets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2942-5) contains supplementary material, which is available to authorized users.

Highlights

  • With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available

  • The complexity and heterogeneity of cancer genomics data and the level of noise in these data make the use of computational methods and statistical machine-learning approaches essential to understanding the key biological functions

  • We presented a nonlinear statistical approach to integrate gene expression, copy number aberration, and somatic mutation data to identify candidate drivers of resistance

Read more

Summary

Introduction

Huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. To understand the mechanism of cancer, biomedical researchers are interested in identifying genomic aberrations that drive cancer progression, which are termed drivers. They are interested in profiling gene expression data to better understand disease pathogenesis. It is hypothesized that driver genes involved in resistance likely have aberrant copy numbers and/or mutations and that the expression patterns of these genes match the mutation and copy

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.