Abstract

Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge “maps” of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease.

Highlights

  • A wealth of knowledge relevant to the biological mechanisms of human disease, including information on protein-protein interactions (PPIs), protein post-translational modifications (PTMs), gene/protein expression and disease-associated mutations is contained in the scientific literature and bioinformatics databases

  • We characterized a group of beta-catenin interacting proteins whose expression is potentially controlled by beta-catenin; we proposed a mechanism for regulation of beta-catenin transcriptional activity by the cyclin dependent kinase CDK5, which was identified as a beta-catenin regulator in a large-scale miRNA-based knock-down screen of the kinome [8]; and we examined beta-catenin cancer-associated mutations in conjunction with other sequence features to determine how beta-catenin activity may be altered in different cancer types

  • We used our recently developed iPTMnet database, which provides a unified presentation of PTM information text-mined from the scientific literature and from multiple highquality curated databases, including PhosphoSitePlus [2]—and Phospho.ELM [11]

Read more

Summary

Introduction

A wealth of knowledge relevant to the biological mechanisms of human disease, including information on protein-protein interactions (PPIs), protein post-translational modifications (PTMs), gene/protein expression and disease-associated mutations is contained in the scientific literature and bioinformatics databases. Combination of text mining tools to extract information from the scientific literature, curated databases, and ontologies, which enable structured representation of entities, relations, and concepts is a powerful strategy for knowledge integration. In previous work [1], we developed a bioinformatics framework for the construction of phosphorylation-centric networks that employed the Rule-based Literature Mining System for Protein Phosphorylation (RLIMS-P) text mining system to extract phosphorylation events from the scientific literature and information from phosphorylation and PPI databases (e.g., PhosphoSitePlus [2] and IntAct [3]), as well as the Protein Ontology (PRO) to represent phosphorylated protein forms (proteoforms; [4]) and the Gene Ontology (GO) [5] for functional annotation. We extend that framework to additional information types and apply it to beta-catenin, a highly studied protein with a role in disease, in order to expand the applicability of the approach to diseasedriver mechanisms

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call