Abstract

BackgroundInsertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes.ResultsComparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold.ConclusionsWe find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

Highlights

  • Insertions/deletions in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers

  • Orthologous protein clusters from 35 proteomes We conducted a broad survey of eukaryotic genome sequence data in order to identify substantial (>250aa), universal or nearly universal, single copy proteins (Figure 1) that could potentially be mined for evolutionarily interpretable indels

  • We find a substantial number of Clade defining indel (CDI) among major groups of eukaryotes, these are unevenly distributed and mostly small (Figure 8)

Read more

Summary

Introduction

Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. Comparisons of protein sequences quickly established that indels in protein coding genes are mostly small, encoding 1–5 amino acids, and occur almost exclusively in loops linking structural elements at the solvent-exposed surfaces of protein structures [2,6,7]. This does not mean that indels are functionally unimportant. Strong positive selection for more and longer indels (5–8 times background) was demonstrated for an ion channel protein, resulting in changes in membrane depolarization rate and motility in sperm [9]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.