Abstract

The knowledgebase TopFIND is an analysis platform focussed on protein termini, their origin, modification and hence their role on protein structure and function. Here, we present a major update to TopFIND, version 3, which includes a 70% increase in the underlying data to now cover a 90 696 proteins, 165 044 N-termini, 130 182 C-termini, 14 382 cleavage sites and 33 209 substrate cleavages in H. sapiens, M. musculus, A. thaliana, S. cerevisiae and E. coli. New features include the mapping of protein termini and cleavage entries across protein isoforms and significantly, the mapping of protein termini originating from alternative transcription and alternative translation start sites. Furthermore, two analysis tools for complex data analysis based on the TopFIND resource are now available online: TopFINDer, the TopFIND ExploRer, characterizes and annotates proteomics-derived N- or C-termini sets for their origin, sequence context and implications for protein structure and function. Neo-termini are also linked to associated proteases. PathFINDer identifies indirect connections between a protease and list of substrates or termini thus supporting the evaluation of complex proteolytic processes in vivo. To demonstrate the utility of the tools, a recent N-terminomics data set of inflamed murine skin has been re-analyzed. In re-capitulating the major findings originally performed manually, this validates the utility of these new resources. The point of entry for the resource is http://clipserve.clip.ubc.ca/topfind from where the graphical interface, all application programming interfaces (API) and the analysis tools are freely accessible.

Highlights

  • Genetic information typically results in many protein species differing in amino acid sequence or by modification of individual amino acids

  • In view of the added complexity arising from altered termini position and nature we developed TopFIND [6,13] to comprehensively integrate data on protein termini and their formation by proteolytic processing as well as to associate shortened protein chains with relevant information on protein function

  • To annotate termini inferred from alternative transcripts, human and mouse Ensembl [14] protein (ENSP) sequences were downloaded in FASTA format from http://uswest.Ensembl.org/ info/data/ftp/index.html

Read more

Summary

INTRODUCTION

Genetic information typically results in many protein species differing in amino acid sequence or by modification of individual amino acids. With PathFINDer we developed the first publicly available tool to identify putative indirect proteolytic effects from in vivo proteomics data by placing proteins in the context of the proteolytic network (the extension of the protease web generated by adding known and MEROPS annotated protease substrates) [12] and identifying indirect connections from a query protease to the protein using graph path finding With these new tools, TopFIND 3.0 addresses and greatly facilitates solving the hardest problem in current protease research, the identification of the cognate protease responsible for a given cleavage event from a complex in vivo sample

MATERIALS AND METHODS
Findings
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.