Abstract

BackgroundMost previous Protein Protein Interaction (PPI) studies evaluated their algorithms' performance based on "per-instance" precision and recall, in which the instances of an interaction relation were evaluated independently. However, we argue that this standard evaluation method should be revisited. In a large corpus, the same relation can be described in various different forms and, in practice, correctly identifying not all but a small subset of them would often suffice to detect the given interaction.MethodsIn this regard, we propose a more pragmatic "per-relation" basis performance evaluation method instead of the conventional per-instance basis method. In the per-relation basis method, only a subset of a relation's instances needs to be correctly identified to make the relation positive. In this work, we also introduce a new high-precision rule-based PPI extraction algorithm. While virtually all current PPI extraction studies focus on improving F-score, aiming to balance the performance on both precision and recall, in many realistic scenarios involving large corpora, one can benefit more from a high-precision algorithm than a high-recall counterpart.ResultsWe show that our algorithm not only achieves better per-relation performance than previous solutions but also serves as a good complement to the existing PPI extraction tools. Our algorithm improves the performance of the existing tools through simple pipelining.ConclusionThe significance of this research can be found in that this research brought new perspective to the performance evaluation of PPI extraction studies, which we believe is more important in practice than existing evaluation criteria. Given the new evaluation perspective, we also showed the importance of a high-precision extraction tool and validated the efficacy of our rule-based system as the high-precision tool candidate.

Highlights

  • Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms’ performance based on “per-instance” precision and recall, in which the instances of an interaction relation were evaluated independently

  • We evaluated several popular machine learning models for our two-tier system including Support Vector Machine (SVM), Naive Bayesian (NB), Decision Tree (DT), and k-Nearest Neighbor, in addition to the two baseline PPI extraction tools

  • In this work, we argued that the current “per-instance” basis performance evaluation method is not pragmatic in many realistic PPI extraction scenarios

Read more

Summary

Introduction

Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms’ performance based on “per-instance” precision and recall, in which the instances of an interaction relation were evaluated independently. We argue that this standard evaluation method should be revisited. The protein-protein interaction (PPI) extraction problem is the most extensively studied. PPI extraction research is largely categorized into two groups based on the types of classification models they use. The others rely on machine learning methods to predict the interaction pairs [5,6,7,8]. New methods are consistently introduced, improving extraction performance. We argue that there is an inherent and grossly ignored problem in the performance evaluation methods employed in the research

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.