Abstract

Copy number aberrations (CNA) are one of the most important classes of genomic mutations related to oncogenetic effects. In the past three decades, a vast amount of CNA data has been generated by molecular-cytogenetic and genome sequencing based methods. While this data has been instrumental in the identification of cancer-related genes and promoted research into the relation between CNA and histo-pathologically defined cancer types, the heterogeneity of source data and derived CNV profiles pose great challenges for data integration and comparative analysis. Furthermore, a majority of existing studies have been focused on the association of CNA to pre-selected “driver” genes with limited application to rare drivers and other genomic elements. In this study, we developed a bioinformatics pipeline to integrate a collection of 44,988 high-quality CNA profiles of high diversity. Using a hybrid model of neural networks and attention algorithm, we generated the CNA signatures of 31 cancer subtypes, depicting the uniqueness of their respective CNA landscapes. Finally, we constructed a multi-label classifier to identify the cancer type and the organ of origin from copy number profiling data. The investigation of the signatures suggested common patterns, not only of physiologically related cancer types but also of clinico-pathologically distant cancer types such as different cancers originating from the neural crest. Further experiments of classification models confirmed the effectiveness of the signatures in distinguishing different cancer types and demonstrated their potential in tumor classification.

Highlights

  • Copy number variations (CNV) are a class of structural genomic variants, in which the regional ploidy differs from the normal state of the corresponding chromosome

  • By using the data processing and modeling procedures described in the Method section, we generated a panel of feature genes from the collected Copy number aberrations (CNA) samples

  • By aggregating samples from individual cancer subtypes, we were able to deduct the set of subtype related, significantly altered feature genes and could generate the CNA signatures of 31 cancer subtypes

Read more

Summary

Introduction

Copy number variations (CNV) are a class of structural genomic variants, in which the regional ploidy differs from the normal state of the corresponding chromosome. Next-Generation Sequencing (NGS) techniques are increasingly adopted to detect copy number variations, Zare et al (2017), Li et al (2018), and Zhang et al (2019) technologies with coverage below shallow whole genome sequencing (Macintyre et al, 2016) show reduced utility for the analysis of CNV events when compared to high-density arrays. Regardless of their technical heterogeneity, a large number of CNV data has been generated in the past three decades, which represents an invaluable asset for genomics studies

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call