Abstract
Germline copy number variants (CNVs) are pervasive in the human genome but potential disease associations with rare CNVs have not been comprehensively assessed in large datasets. We analysed rare CNVs in genes and non-coding regions for 86,788 breast cancer cases and 76,122 controls of European ancestry with genome-wide array data. Gene burden tests detected the strongest association for deletions in BRCA1 (P = 3.7E−18). Nine other genes were associated with a p-value < 0.01 including known susceptibility genes CHEK2 (P = 0.0008), ATM (P = 0.002) and BRCA2 (P = 0.008). Outside the known genes we detected associations with p-values < 0.001 for either overall or subtype-specific breast cancer at nine deletion regions and four duplication regions. Three of the deletion regions were in established common susceptibility loci. To the best of our knowledge, this is the first genome-wide analysis of rare CNVs in a large breast cancer case-control dataset. We detected associations with exonic deletions in established breast cancer susceptibility genes. We also detected suggestive associations with non-coding CNVs in known and novel loci with large effects sizes. Larger sample sizes will be required to reach robust levels of statistical significance.
Highlights
Germline copy number variants (CNVs) are pervasive in the human genome but potential disease associations with rare CNVs have not been comprehensively assessed in large datasets
We recently developed a new CNV calling method, CamCNV16, which focuses on rare CNVs and identifies outlier samples that may have a CNV, based on the intensity distribution across all samples at each probe
Duplications tended to be longer than deletions: for example, deletions called on OncoArray covered a mean of 45 Kilobases (Kb) (SD 106 Kb) over 9.8 probes (SD 17.2), while duplications covered a mean of 109 Kb (SD 202 Kb) over 18.9 probes (SD 36.5)
Summary
Germline copy number variants (CNVs) are pervasive in the human genome but potential disease associations with rare CNVs have not been comprehensively assessed in large datasets. We analysed rare CNVs in genes and non-coding regions for 86,788 breast cancer cases and 76,122 controls of European ancestry with genome-wide array data. Rare loss of function variants in susceptibility genes such as BRCA1 and CHEK2 are associated with a large increase in risk[6]. Large-scale genome-wide association studies (GWAS) have established breast cancer associations with common variants at more than 150 loci, mostly in non-coding regions[8,9,10,11]. We recently developed a new CNV calling method, CamCNV16, which focuses on rare CNVs and identifies outlier samples that may have a CNV, based on the intensity distribution across all samples at each probe We showed that this approach is able to detect CNVs using as few as three probes[16].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have