Abstract
Protein-protein interaction (PPI) plays a key role in understanding cellular mechanisms in different organisms. Many supervised classifiers like Random Forest (RF) and Support Vector Machine (SVM) have been used for intra or inter-species interaction prediction. For improving the prediction performance, in this paper we propose a novel set of features to represent a protein pair using their annotated Gene Ontology (GO) terms, including their ancestors. In our approach, a protein pair is treated as a document (bag of words), where the terms annotating the two proteins represent the words. Feature value of each word is calculated using information content of the corresponding term multiplied by a coefficient, which represents the weight of that term inside a document (i.e., a protein pair). We have tested the performance of the classifier using the proposed feature on different well known data sets of different species like S. cerevisiae, H. Sapiens, E. Coli, and D. melanogaster. We compare it with the other GO based feature representation technique, and demonstrate its competitive performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.