Abstract

Copy number variations (CNVs) constitute a major source of genetic variations in human populations and have been reported to be associated with complex diseases. Methods have been developed for detecting CNVs and testing CNV associations in genome-wide association studies (GWAS) based on SNP arrays. Commonly used two-step testing procedures work well only for long CNVs while direct CNV association testing methods work only for recurrent CNVs. Assuming that short CNVs disrupting any part of a given genomic region increase disease risk, we developed a variable threshold exact test (VTET) for testing disease associations of CNVs randomly distributed in the genome using intensity data from SNP arrays. By extensive simulations, we found that VTET outperformed two-step testing procedures based on existing CNV calling algorithms for short CNVs and that the performance of VTET was robust to the length of the genomic region. In addition, VTET had a comparable performance with CNVtools for testing the association of recurrent CNVs. Thus, we expect VTET to be useful for testing disease associations of both recurrent and randomly distributed CNVs using existing GWAS data. We applied VTET to a lung cancer GWAS and identified a genome-wide significant region on chromosome 18q22.3 for lung squamous cell carcinoma.

Highlights

  • We developed www.frontiersin.org a new method, variable threshold exact test (VTET), for testing associations for Copy number variations (CNVs) randomly distributed in a short genomic region, a problem that was not addressed by the current methods

  • We tested this tool in a lung cancer genomewide association studies (GWAS) and have identified a genome-wide significant region on chromosome 18q22.3 for lung squamous cell carcinoma

  • We show through simulations that VTET is as powerful as the ideal test for short CNVs covering five or more probes and is only slightly less powerful for shorter CNVs covering three or four probes

Read more

Summary

Introduction

Copy number variations (CNVs) are one of the major sources of genetic variations in the human genome (Redon et al, 2006) and have been reported to be associated with a variety of complex diseases (Sebat et al, 2007; Consortium, 2008; Stefansson et al, 2008; Bucan et al, 2009; Diskin et al, 2009; Glessner et al, 2009; McCarthy et al, 2009; Levinson et al, 2011). CNVs are called for each subject using CNV detection algorithms (Olshen et al, 2004; Colella et al, 2007; Wang et al, 2007; Korn et al, 2008; Coin et al, 2010) followed by the association analysis comparing each probe or genomic region against the disease phenotype of interest. This standard two-step strategy is most useful for detecting associations of long CNVs with excellent calling accuracy. More algorithms have been recently developed with better sensitivity for detecting shorter CNVs (Pique-Regi et al, 2008; Wang et al, 2009; Jeng et al, 2010; Jang et al, 2013); their performances for large-scale GWAS data remain to be systematically evaluated

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call