Abstract
BackgroundBiological entities do not perform in isolation, and often, it is the nature and degree of interactions among numerous biological entities which ultimately determines any final outcome. Hence, experimental data on any single biological entity can be of limited value when considered only in isolation. To address this, we propose that augmenting individual entity data with the literature will not only better define the entity’s own significance but also uncover relationships with novel biological entities.To test this notion, we developed a comprehensive text mining and computational methodology that focused on discovering new targets of one class of molecular entities, transcription factors (TF), within one particular disease, colorectal cancer (CRC).MethodsWe used 39 molecular entities known to be associated with CRC along with six colorectal cancer terms as the bait list, or list of search terms, for mining the biomedical literature to identify CRC-specific genes and proteins. Using the literature-mined data, we constructed a global TF interaction network for CRC. We then developed a multi-level, multi-parametric methodology to identify TFs to CRC.ResultsThe small bait list, when augmented with literature-mined data, identified a large number of biological entities associated with CRC. The relative importance of these TF and their associated modules was identified using functional and topological features. Additional validation of these highly-ranked TF using the literature strengthened our findings. Some of the novel TF that we identified were: SLUG, RUNX1, IRF1, HIF1A, ATF-2, ABL1, ELK-1 and GATA-1. Some of these TFs are associated with functional modules in known pathways of CRC, including the Beta-catenin/development, immune response, transcription, and DNA damage pathways.ConclusionsOur methodology of using text mining data and a multi-level, multi-parameter scoring technique was able to identify both known and novel TF that have roles in CRC. Starting with just one TF (SMAD3) in the bait list, the literature mining process identified an additional 116 CRC-associated TFs. Our network-based analysis showed that these TFs all belonged to any of 13 major functional groups that are known to play important roles in CRC. Among these identified TFs, we obtained a novel six-node module consisting of ATF2-P53-JNK1-ELK1-EPHB2-HIF1A, from which the novel JNK1-ELK1 association could potentially be a significant marker for CRC.
Highlights
Biological entities do not perform in isolation, and often, it is the nature and degree of interactions among numerous biological entities which determines any final outcome
Tying in with the need for a global transcription factors (TF) interaction network analysis in colorectal cancer (CRC), the focus on CRC is lastly due to the need for identification of CRCspecific TFs as potential disease markers, and here we demonstrate the ability of a bioinformatics approach incorporating knowledge from the literature, topological network properties, and biological features to achieve this goal
Construction of TF interaction network of CRC For the 2,634 molecular entities, using the Gene Ontology Annotation Similarity Score, we identified 700 gene interactions that involved at least one TF
Summary
Biological entities do not perform in isolation, and often, it is the nature and degree of interactions among numerous biological entities which determines any final outcome. We propose that augmenting individual entity data with the literature will better define the entity’s own significance and uncover relationships with novel biological entities. To test this notion, we developed a comprehensive text mining and computational methodology that focused on discovering new targets of one class of molecular entities, transcription factors (TF), within one particular disease, colorectal cancer (CRC). Gene expression datasets have been widely used to identify genes and pathways as markers for the specific disease or outcome to which they are linked [1,2,3,4]. Many computational approaches using functional annotation, gene expression data, sequence based knowledge, phenotype similarity have since been developed to prioritize genes, and recent studies have demonstrated the application of system biology approaches to study the disease relevant gene prioritization
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.