Deep Residual Convolutional Neural Network for Protein-Protein Interaction Extraction

Hao Zhang,Lan Huang,Renchu Guan,Xiaoyue Feng,Fengfeng Zhou,Yanchun Liang,Zhi-Hui Zhan

doi:10.1109/access.2019.2927253

Abstract

Knowledge extracted from the protein-protein interaction (PPI) network can help researchers reveal the molecular mechanisms of biological processes. With the rapid growth in the volume of the biomedical literature, manually detecting and annotating PPIs from raw literature has become increasingly difficult. Hence, automatically extracting PPIs by machine learning methods from raw literature has gained significance in the biomedical research. In this paper, we propose a novel PPI extraction method based on the residual convolutional neural network (CNN). This is the first time that the residual CNN is applied to the PPI extraction task. In addition, the previous state-of-the-art PPI extraction models heavily rely on parsing results from natural language processing tools, such as dependence parsers. Our model does not rely on any parsing tools. We evaluated our model based on five benchmark PPI extraction corpora, AIMed, BioInfer, HPRD50, IEPA, and LLL. The experimental results showed that our model achieved the best results compared with the previous kernel-based and CNN-based PPI extraction models. Compared with the previous recurrent neural network-based PPI extraction models, our model achieved better or comparable performance.

Highlights

Protein-protein interaction (PPI) is a physical contact established between two or more protein molecules resulting from biochemical events, which provides a useful proxy for cellular communication lattices and can be discovered in almost all cellular processes, such as metabolism, signaling, regulation, and proliferation [1], [2]
We propose a deep residual convolutional neural network model for PPI extraction
The shortest dependency path (SDP) can be regarded as a simplified sentence, and some deep learning based biomedical relation extraction models rely on the SDP [26], [30], [40], [41]

Summary

INTRODUCTION

Protein-protein interaction (PPI) is a physical contact established between two or more protein molecules resulting from biochemical events, which provides a useful proxy for cellular communication lattices and can be discovered in almost all cellular processes, such as metabolism, signaling, regulation, and proliferation [1], [2]. Based on manually designed lexical, syntactic and dependency features, Saetre et al [11] trained a support vector machine model for PPI extraction. The combination of residual connection and downsampling operation makes our model achieve great performance improvement compared with previous shallow CNN-based models. Zhao et al [39] combined the deep feed-forward neural network and manually selected features to extract PPIs from sentences but the model did not present deep learning model’s advantage compared with other RNN and CNN based models. The shortest dependency path (SDP) can be regarded as a simplified sentence, and some deep learning based biomedical relation extraction models rely on the SDP [26], [30], [40], [41]. The number of convolutional modules ‘Conv_Num’ in each residual convolutional block is a hyper-parameter

DOWNSAMPLING BY MAX-POOLING

DATASETS

Findings

CONCLUSION AND FUTURE WORK