Abstract

Tandem repeat polymorphisms in human proteins were characterized using the UniGene dataset. This analysis suggests that 1 in 20 proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions; these were prevalent among protein-binding proteins.

Highlights

  • Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts

  • A total of 89,243 tandem repeats were detected in proteincoding regions of the 13,783 UniGene representative sequences

  • We found 295 allelic variants that differed from the UniGene representative sequence (Additional data file 1) and 85.8% of these variants were a multiple of three nucleotides (253/295)

Read more

Summary

Results

Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms

Conclusion
Background
Results and discussion
Materials and methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call