Abstract
BackgroundProteins interact with other proteins or biomolecules in complexes to perform cellular functions. Existing protein-protein interaction (PPI) databases and protein complex databases for human proteins are not organized to provide protein complex information or facilitate the discovery of novel subunits. Data integration of PPIs focused specifically on protein complexes, subunits, and their functions. Predicted candidate complexes or subunits are also important for experimental biologists.DescriptionBased on integrated PPI data and literature, we have developed a human protein complex database with a complex quality index (PCDq), which includes both known and predicted complexes and subunits. We integrated six PPI data (BIND, DIP, MINT, HPRD, IntAct, and GNP_Y2H), and predicted human protein complexes by finding densely connected regions in the PPI networks. They were curated with the literature so that missing proteins were complemented and some complexes were merged, resulting in 1,264 complexes comprising 9,268 proteins with 32,198 PPIs. The evidence level of each subunit was assigned as a categorical variable. This indicated whether it was a known subunit, and a specific function was inferable from sequence or network analysis. To summarize the categories of all the subunits in a complex, we devised a complex quality index (CQI) and assigned it to each complex. We examined the proportion of consistency of Gene Ontology (GO) terms among protein subunits of a complex. Next, we compared the expression profiles of the corresponding genes and found that many proteins in larger complexes tend to be expressed cooperatively at the transcript level. The proportion of duplicated genes in a complex was evaluated. Finally, we identified 78 hypothetical proteins that were annotated as subunits of 82 complexes, which included known complexes. Of these hypothetical proteins, after our prediction had been made, four were reported to be actual subunits of the assigned protein complexes.ConclusionsWe constructed a new protein complex database PCDq including both predicted and curated human protein complexes. CQI is a useful source of experimentally confirmed information about protein complexes and subunits. The predicted protein complexes can provide functional clues about hypothetical proteins. PCDq is freely available at http://h-invitational.jp/hinv/pcdq/.
Highlights
Proteins interact with other proteins or biomolecules in complexes to perform cellular functions
complex quality index (CQI) is a useful source of experimentally confirmed information about protein complexes and subunits
With the object of estimating the degree of Gene Ontology (GO) term consistency expected by chance, 100 sets of randomly selected genes from H-Invitational Database” (H-InvDB), all representative transcripts with complex sizes matching our annotation of PCset1, were created and used as a control
Summary
We predicted and annotated 1,264 human protein complexes from integrated PPI data. GO analysis increased the reliability of both complex prediction and manual annotation. The analysis of expression profiles and duplicated genes made it clear that protein subunits tend to be expressed and are mutually paralogous within complexes. Comprehensive protein complex prediction and annotation will provide strong functional annotation clues about hypothetical proteins. We constructed a new human protein complex database with quality index (PCDq) to provide this comprehensive annotation of human protein complexes. Availability and requirements PCDq is freely available at the URL http://h-invitational.jp/ hinv/pcdq/
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.