Abstract

Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population.

Highlights

  • How much do different classes of sequence polymorphisms contribute to human phenotypic variation and disease susceptibility? Traditionally, because they are abundant and detectable, single nucleotide polymorphisms (SNPs) have been expected to contribute most

  • Properties that are overrepresented within copynumber variant (CNV) are indicated with an upwards arrow ("), whereas those that are underrepresented are indicated with a downwards arrow (#)

  • The 627 human CNV regions span 98,125,520 bp, whereas the 346 CNV bacterial artificial chromosome (BAC) span 55,998,000 bp. aProteins either partially or entirely encoded within CNVs predicted by the SignalP algorithm to be secreted. bMouse CNV genes were not overrepresented in regions approaching telomeres; no significance estimate was determined as the CNV genes’ distances to telomeric ends did not approximate to a Gaussian distribution. *p . 0.05, **p, 0.05

Read more

Summary

Introduction

How much do different classes of sequence polymorphisms contribute to human phenotypic variation and disease susceptibility? Traditionally, because they are abundant and detectable, single nucleotide polymorphisms (SNPs) have been expected to contribute most. As techniques have improved for detecting polymorphisms at larger scales, evidence has accumulated that these occur far more frequently than hitherto suspected. Several hundred regions that are variable in copy number have been identified in both human populations [2,3,4,5] and mouse strains [6]. Whether these large-scale copy-number variants (CNVs) are associated with disease is as yet unknown, their abundance and size imply that they may yet be found to underlie functional variation. Relatively few of the human CNVs detected far in independent studies overlap [7], indicating that, numerous, individual CNVs may occur with low minor allele frequencies in the human population

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call