Abstract

Existing protein-protein interactions databases cover only a portion of the interactome and interaction information on protein isoforms is underrepresented. This leads to a lack of information on the functional similarity of protein isoforms and the effects of transcript diversity on the protein interaction networks. We present a comprehensive automated literature analysis that extracts interactions involving human protein isoforms linked to clusters of transcripts with high sequence similarity and deliver them in a database called TBIID for knowledge discovery. We measure the interaction variability of the isoforms from the clustered transcripts by analysing the distribution of their interaction partners in TBIID. Almost all clusters analyzed (99%) contain isoforms with unique partners indicating that isoforms are specialized towards forming unique interactions and thus achieving functional diversity, which is similar to the results from public resources. TBIID is available at http://tbiid.emu.edu.tr containing most relevant candidates for future experiments focusing on understanding the isoform interaction networks and the resulting functional implications.

Highlights

  • Recent research in molecular biology has focussed on the identification of protein-protein interactions (PPIs) and the analysis of Protein-Protein Interaction (PPI) networks to fully understand the organism’s functionality

  • 620 (3.68%) clusters overlap with other clusters, since at least one Defined Transcripts (DTs) from any of these clusters shares the description with a DT belonging to a different cluster

  • A total of 13,174 DTs are contained in all Clusters with a Single defined Transcript (CSTs) and Clusters with Multiple defined Transcripts (CMTs) of HumanSDB3 (12,638 clusters in total) and all were used for abstract retrieval leading to a corpus of 4,083,094 abstracts (Table 3)

Read more

Summary

Introduction

Recent research in molecular biology has focussed on the identification of protein-protein interactions (PPIs) and the analysis of PPI networks to fully understand the organism’s functionality. These efforts have produced collections of PPI data by using high-throughput methods such as yeast two hybrid (Y2H) and affinity purification [1], as well as literature mining methods [2]. Several comprehensive PPI databases are the Database of Interacting Proteins (DIP) [3], the Molecular INTeraction Database (MINT) [4] and IntAct [5] These databases still cover only a portion of the interactome [6,7] and show limitations regarding PPIs involving protein isoforms. In the PINA database [8] only a small portion of the interaction pairs (772, i.e. 1.3% of all interactions in PINA) involve a protein that is a splicing variant according to Uniprot Knowledge Base [9]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call