Abstract

Copy number variation (CNV) has been shown to have important phenotypic consequences and occurs in organisms as diverse as yeast and man. We wish to create a pipeline that compares genomic sequence reads with a reference genome sequence to predict regions with copy number variation (including deletions). We are using three approaches. The first approach assumes that the number of sequencing reads for a genomic region follows a Poisson distribution. Based on the observed number of reads aligning to a region, we can then determine a 95% confidence interval for the actual copy number. The second method identifies sequence reads that partially align at two locations in the reference genome. Trivially, such reads could be chimaeric, but alternatively, this could indicate sequence duplications or deletions. In the case of a duplication, we would expect at some frequency (depending on sequence coverage) to also observe reads with that sequence in its normal location within the reference sequence. The final approach, which is currently in progress, examines the length of continuous regions along the reference sequence that either are or are not represented in the sequence reads, and assigns a probability based on this occurring by random chance. Regions for which this probability is low indicate a probable duplication or deletion, respectively. To develop and test our methodology for predicting CNV regions, we are analyzing data from the baker's yeast S. cerevisiae, using the high quality reference sequence and sequencing data from 37 additional strains. Subsequently, the phenotypic variability of the strains under different growth conditions will be examined and correlated to the proposed copy number to investigate possible functional consequences of the observed copy number variation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call