Abstract

BackgroundCopy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical applications. Ovarian serous carcinomas can be classified into two largely mutually exclusive grades, low grade and high grade, based on their histologic features. The grade classification based on the genomics may provide valuable clue on how to best manage these patients in clinic. Based on the study of ovarian serous carcinomas, we explore the methodology of combining CNAs reporting from low-coverage sequencing with machine learning techniques to stratify tumor biospecimens of different grades.ResultsWe have developed a data-driven methodology for tumor classification using the profiles of CNAs reported by low-coverage sequencing. The proposed method called Bag-of-Segments is used to summarize fixed-length CNA features predictive of tumor grades. These features are further processed by machine learning techniques to obtain classification models.High accuracy is obtained for classifying ovarian serous carcinoma into high and low grades based on leave-one-out cross-validation experiments. The models that are weakly influenced by the sequence coverage and the purity of the sample can also be built, which would be of higher relevance for clinical applications. The patterns captured by Bag-of-Segments features correlate with current clinical knowledge: low grade ovarian tumors being related to aneuploidy events associated to mitotic errors while high grade ovarian tumors are induced by DNA repair gene malfunction.ConclusionsThe proposed data-driven method obtains high accuracy with various parametrizations for the ovarian serous carcinoma study, indicating that it has good generalization potential towards other CNA classification problems. This method could be applied to the more difficult task of classifying ovarian serous carcinomas with ambiguous histology or in those with low grade tumor co-existing with high grade tumor. The closer genomic relationship of these tumor samples to low or high grade may provide important clinical value.

Highlights

  • Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions

  • The two most important CNA segment classes identified by the Gini importance score are the Narrow Amplified (NA) and Wide Normal (WN) as shown in Fig. 6, as displayed in Fig. 7, a high correlation exists between the 9 features used to build the model

  • The proposed Bag-of-Segments method is used to summarize the CNA features from sequencing coverage data

Read more

Summary

Introduction

Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Based on the study of ovarian serous carcinomas, we explore the methodology of combining CNAs reporting from low-coverage sequencing with machine learning techniques to stratify tumor biospecimens of different grades. Defined as somatic gain or loss of DNA regions, Copy Number Alterations (CNAs) are reflective of genomic instability, frequently affecting functionally important genes, such as tumor suppressors and oncogenes. CNAs are associated with the early onset of tumor They include both deletions and amplifications of large or small genomic regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor class [2]. The recurrent CNAs across tumor types have been studied in an attempt to gain a deeper understanding of the pan-cancer mechanisms driving tumorigenesis

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.