Abstract

BackgroundWe have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified.ResultsConsistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless.ConclusionsOur results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease.

Highlights

  • Why do humans suffer from the diseases that we do? In part this is clearly due to our anatomy and physiology, and that of the organisms that infect us - we cannot have a disease of an organ that we do not possess

  • We predict that unless all possible diseases with a genetic basis, and all the genes that can cause them, have already been discovered, genes associated with diseases should have higher genic mutation rates than non-disease genes, where the genic mutation rate is determined by the product of gene length and the mutation rate per site

  • The presence of an association signal within the boundaries of the gene does not necessarily mean that the causative mutation is within the protein coding sequence or even within the boundaries of the gene, and many of these associations may be in regulatory sequences [21]

Read more

Summary

Results

We predict that unless all possible diseases with a genetic basis, and all the genes that can cause them, have already been discovered, genes associated with diseases should have higher genic mutation rates than non-disease genes, where the genic mutation rate is determined by the product of gene length and the mutation rate per site. We find that intron SNP density is positively correlated to GC content, female and male rates of recombination and distance to the centromere and negatively correlated to the time of replication (late genes have higher diversity), nucleosome occupancy, germline expression and distance to the telomere (Table 1) If control for these factors by taking the residuals from the multiple regression we find that SNP density is still significantly greater in both Mendelian and GWAS genes, than in non-disease genes (ANOVA p,0.001; individual t-tests p,0.001). Likewise we find the average genealogy length is positively correlated to all variables except GC content, nucleosome occupancy and male recombination rate (Table 1), and that after controlling for these associations, disease genes still have significantly greater average genealogy lengths than non disease genes (ANOVA p = 0.019; individual ttests Mendelian versus non-disease p = 0.21, GWAS versus nondisease p = 0.001). Disease genes have a greater number of SNPs per bp than non-disease genes the distribution of the genetic variation varies in an inconsistent manner between categories of genes; the average minor allele frequency is ,10% greater in Mendelian, and ,10% lower in GWAS genes, than in non-disease genes (ANOVA p,0.01) (Figure 3b)

Introduction
Materials and Methods
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call