Abstract

The increasing number of annotated genome sequences in public databases has made it possible to study the length distributions and domain composition of proteins at unprecedented resolution. To identify factors that influence protein length in metazoans, we performed an analysis of all domain-annotated proteins from a total of 49 animal species from Ensembl (v.56) or EnsemblMetazoa (v.3). Our results indicate that protein length constraints are not fixed as a linear function of domain count and can vary based on domain content. The presence of repeating domains was associated with relaxation of the constraints that govern protein length. Conversely, for proteins with unique domains, length constraints were generally maintained with increased domain counts. It is clear that mean (and median) protein length and domain composition vary significantly between metazoans and other kingdoms; however, the connections between function, domain content, and length are unclear. We incorporated Gene Ontology (GO) annotation to identify biological processes, cellular components, or molecular functions that favor the incorporation of multi-domain proteins. Using this approach, we identified multiple GO terms that favor the incorporation of multi-domain proteins; interestingly, several of the GO terms with elevated domain counts were not restricted to a single gene family. The findings presented here represent an important step in resolving the complex relationship between protein length, function, and domain content. The comparison of the data presented in this work to data from other kingdoms is likely to reveal additional differences in the regulation of protein length.

Highlights

  • The proteome of an organism is broadly defined as the sum total of all proteins expressed by its genome [1]

  • Length constraints are relaxed as domain count increases: With the availability of multiple animal proteomes in public databases it is clear that protein length can vary over several orders of magnitude within a given proteome, yet the distribution of protein lengths across metazons are very similar

  • Our results indicate that protein length constraints are not fixed as a linear function of domain count; rather, overall protein length constrains are relaxed with increasing domain count

Read more

Summary

Introduction

The proteome of an organism is broadly defined as the sum total of all proteins expressed by its genome [1]. Given the breadth of data available, a powerful tool to identify the underlying principles that govern the length or domain content of proteins is the comparative analysis across species. This type of analysis has been used to identify the functional and evolutionary constraints that control the domain content and length of proteins. Deletion, or fusion of two domains can have profound impact on the function of a protein, the evolutionary history of any given species is reflected in the unique combination of domains that make up its proteome

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call