Abstract
Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, these networks face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we apply a robust measure of local network structure called common neighborhood similarity (CNS) to address these challenges. Although several CNS measures have been proposed in the literature, an understanding of their relative efficacies for the analysis of interaction networks has been lacking. We follow the framework of graph transformation to convert the given interaction network into a transformed network corresponding to a variety of CNS measures evaluated. The effectiveness of each measure is then estimated by comparing the quality of protein function predictions obtained from its corresponding transformed network with those from the original network. Using a large set of human and fly protein interactions, and a set of over GO terms for both, we find that several of the transformed networks produce more accurate predictions than those obtained from the original network. In particular, the measure and other continuous CNS measures perform well this task, especially for large networks. Further investigation reveals that the two major factors contributing to this improvement are the abilities of CNS measures to prune out noisy edges and enhance functional coherence in the transformed networks.
Highlights
Protein interaction networks are one of the most promising types of data for studying complex biological problems, such as identifying disease-related proteins and networks [1,2,3] and finding functional modules and functions of individual proteins [4,5]
The transformed networks derived from the binary version of the original network had substantially fewer connected proteins than the original network (4768 and 3093 for human and fly respectively), while those derived from the continuous version had effectively the same number of connected proteins as the original network
We showed that while common neighborhood similarity (CNS)-based graph transformation is generally useful, transformation based on CNS measures that are able to utilize continuous edge weights or reliabilities in the original network are especially effective for tasks such as protein function prediction
Summary
Protein interaction networks are one of the most promising types of data for studying complex biological problems, such as identifying disease-related proteins and networks [1,2,3] and finding functional modules and functions of individual proteins [4,5]. Despite the rich information embedded in protein interaction networks, they face several data quality challenges that adversely affect the results obtained from their analysis. Studies have shown that the presence of noise in these networks has significant adverse affects on the performance of several types of analyses, including protein function prediction algorithms [10] Another important problem facing the use of these networks is their incompleteness, i.e., the absence of biologically valid interactions from the currently available data sets [8,9,11]. This lack of completeness is mainly caused by the specific targeting of bait and prey proteins by individual studies (based on criteria such as functional annotations), which can only generate relatively small samples of the entire interactome of an organism. Noise (false positives) and incompleteness (false negatives) are major challenges facing protein interaction data that need to be addressed in order to obtain richer information from them
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.