Abstract

BackgroundA growing body of evidence shows that gene products encoded by short open reading frames play key roles in numerous cellular processes. Yet, they are generally overlooked in genome assembly, escaping annotation because small protein-coding genes are difficult to predict computationally. Consequently, there are still a considerable number of small proteins whose functions are yet to be characterized.ResultsTo address this issue, we apply a collection of structural bioinformatics algorithms to infer molecular function of putative small proteins from the mouse proteome. Specifically, we construct 1,743 confident structure models of small proteins, which reveal a significant structural diversity with a noticeably high helical content. A subsequent structure-based function annotation of small protein models exposes 178,745 putative protein-protein interactions with the remaining gene products in the mouse proteome, 1,100 potential binding sites for small organic molecules and 987 metal-binding signatures.ConclusionsThese results strongly indicate that many small proteins adopt three-dimensional structures and are fully functional, playing important roles in transcriptional regulation, cell signaling and metabolism. Data collected through this work is freely available to the academic community at http://www.brylinski.org/content/databases to support future studies oriented on elucidating the functions of hypothetical small proteins.

Highlights

  • A growing body of evidence shows that gene products encoded by short open reading frames play key roles in numerous cellular processes

  • The development of generation sequencing (NGS) enables researchers to reach into almost complete genomes of numerous species [2,3], revealing more and more details on individual organisms functioning as systems

  • In this study, we apply a collection of tools for evolution/ structure-based function annotation of small proteins identified in the mouse proteome

Read more

Summary

Introduction

A growing body of evidence shows that gene products encoded by short open reading frames play key roles in numerous cellular processes. They are generally overlooked in genome assembly, escaping annotation because small protein-coding genes are difficult to predict computationally. Difficulties of de novo NGS assembly arise from e.g. contaminating sequences [4], low-quality reads [5], segmental duplications and large common repeats [6]. Contaminating sequences [4], low-quality reads [5], segmental duplications and large common repeats [6] Another salient flaw is a short-length discontinuity, which has been noted for several assembled genomes [7,8]. Several highlighted biological functions include engaging in regulatory processes [14], interacting with a lipid membrane [15] or even modulating its features, acting as chaperones of nucleic acids and metals [16], and stabilizing the structures of larger protein assemblies [17]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call