Automated Extraction and Visualization of Protein-Protein Interaction Networks and Beyond: A Text-Mining Protocol.

Kalpana Raja,Jeyakumar Natarajan,Ian Ross,James Thomson,Finn Kuusisto,John Steill,Ron Stewart

doi:10.1007/978-1-4939-9873-9_2

Abstract

Proteins perform their functions by interacting with other proteins. Protein-protein interaction (PPI) is critical for understanding the functions of individual proteins, the mechanisms of biological processes, and the disease mechanisms. High-throughput experiments accumulated a huge number of PPIs in PubMed articles, and their extraction is possible only through automated approaches. The standard text-mining protocol includes four major tasks, namely, recognizing protein mentions, normalizing protein names and aliases to unique identifiers such as gene symbol, extracting PPIs, and visualizing the PPI network using Cytoscape or other visualization tools. Each task is challenging and has been revised over several years to improve the performance. We present a protocol based on our hybrid approaches and show the possibility of presenting each task as an independent web-based tool, NAGGNER for protein name recognition, ProNormz for protein name normalization, PPInterFinder for PPI extraction, and HPIminer for PPI network visualization. The protocol is specific to human but can be generalized to other organisms. We include KinderMiner, our most recent text-mining tool that predicts PPIs by retrieving significant co-occurring protein pairs. The algorithm is simple, easy to implement, and generalizable to other biological challenges.

Full Text