Abstract

In many cancers, incidence, treatment efficacy and overall prognosis vary between geographic populations. Studies disentangling the contributing factors may help in both understanding cancer biology and tailoring therapeutic interventions. Ancestry estimation in such studies should preferably be driven by genomic data, due to frequently missing or erroneous self-reported or inferred metadata. While respective algorithms have been demonstrated for baseline genomes, such a strategy has not been shown for cancer genomes carrying a substantial somatic mutation load. We have developed a bioinformatics tool for the assignment of population groups from genome profiling data for both unaltered and cancer genomes. Despite extensive somatic mutations in the cancer genomes, consistency between germline and cancer data reached of 97% and 92% for assignment into 5 and 26 ancestral groups, respectively. Comparison with self-reported meta-data estimated a matching rate between 88–92%, mostly limited by interpretation of self-reported ethnicity labels compared to the standardized mapping output. Our SNP2pop application allows to assess population information from SNP arrays as well as sequencing platforms and to estimate the population structure in cancer genomics projects, to facilitate research into the interplay between ethnicity-related genetic background, environmental factors and somatic mutation patterns in cancer biology.

Highlights

  • Cancer arises from the accumulation of genomic aberrations in dividing cells of virtually all types of proliferating tissues

  • We developed SNP2pop pipeline to assign genomic data to ancestral groups defined by 1000 Genomes Project[27], i.e. five continental groups and 26 population groups, respectively: 1. Admixed American (AMR) includes: Colombians from Medellin, Colombia (CLM), Mexican Ancestry from Los Angeles USA (MXL), Peruvians from Lima, Peru (PEL) and Puerto Ricans from Puerto Rico (PUR); 2

  • The presented method resolves two issues that have been missing in the earlier approaches to estimate population structure from genotyping data: 1. it allows the integration of platforms with heterogeneous, non-overlapping single nucleotide polymorphisms (SNPs) positions for cross-platform meta-analysis; 2. it confirms the possibility of deriving the population origin using data from aberrant cancer genomes in addition to those of unaltered genomes

Read more

Summary

Introduction

Cancer arises from the accumulation of genomic aberrations in dividing cells of virtually all types of proliferating tissues (somatic variations). Cancer studies have reported significant world-wide variation in incidence and prognosis between ethnicity groups[5,6,7,8]. While such differences have been attributed to unequal social and economical circumstances which influence risk factors and therapeutic interventions, several studies have shown impact of population specific genomic variants with predisposing effects on malignant transformation and phenotypic behaviour[9,10,11,12]. Lower panel indicates the allele specific copy number, represented by B-allele frequency (BAF, No. B allele copy ). Compared to the unaltered genome (A), the cancerous counterpart (lower) has copy number loss in chr8, 10p, 18, 19qter, 22q and copy-neutral loss of heterozygocity in chr9,12q, accounting for 18.2% of genome

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.