Abstract

Short proteins play key roles in cell signalling and other processes, but their abundance in the mammalian proteome is unknown. Current catalogues of mammalian proteins exhibit an artefactual discontinuity at a length of 100 aa, so that protein abundance peaks just above this length and falls off sharply below it. To clarify the abundance of short proteins, we identify proteins in the FANTOM collection of mouse cDNAs by analysing synonymous and non-synonymous substitutions with the computer program CRITICA. This analysis confirms that there is no real discontinuity at length 100. Roughly 10% of mouse proteins are shorter than 100 aa, although the majority of these are variants of proteins longer than 100 aa. We identify many novel short proteins, including a “dark matter” subset containing ones that lack detectable homology to other known proteins. Translation assays confirm that some of these novel proteins can be translated and localised to the secretory pathway.

Highlights

  • Large-scale cDNA annotation has often been performed under the assumption that protein-coding transcripts encode for peptides of 100 aa or longer

  • The CRITICA pipeline was used to identify proteins encoded in the 102,801 FANTOM cDNA sequences

  • A further 3,344 predictions lacked stop codons, so that the coding region runs off the end of the cDNA; Table 1

Read more

Summary

Introduction

Large-scale cDNA annotation has often been performed under the assumption that protein-coding transcripts encode for peptides of 100 aa or longer. Short proteins are important mediators of biological processes that include (1) regulation of innate immunity (via more than a dozen members of the small inducible cytokine families CCL and CXCL), (2) protection against pathogens (via more than two dozen of the xenobiotic defensin and defensin-related cryptidin factors), (3) cell communication and homeostasis as ligands and hormones (e.g., Apln, Gnrh, and Ppy), (4) signal transduction (e.g., the Pki protein kinase inhibitor and Gng guanine nucleotide binding protein–gamma families), and, (5) metabolism (e.g., playing key roles in mitochondrial electron transport, cytochrome C subunit, and co-enzyme metabolism). Current open reading frame (ORF) predictions and annotation projects are missing them

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.