An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction

Jerry Emmanuel,Jelili Oyelade,Grace Olasehinde,Itunuoluwa Isewon

doi:10.2174/0115748936286848240108074303

Abstract

Background: The use of machine learning models in sequence-based Protein-Protein Interaction prediction typically requires the conversion of amino acid sequences into feature vectors. From the literature, two approaches have been used to achieve this transformation. These are referred to as the Independent Protein Feature (IPF) and Merged Protein Feature (MPF) extraction methods. As observed, studies have predominantly adopted the IPF approach, while others preferred the MPF method, in which host and pathogen sequences are concatenated before feature encoding. Objective: This presents the challenge of determining which approach should be adopted for improved HPPPI prediction. Therefore, this work introduces the Extended Protein Feature (EPF) method. Methods: The proposed method combines the predictive capabilities of IPF and MPF, extracting essential features, handling multicollinearity, and removing features with zero importance. EPF, IPF, and MPF were tested using bacteria, parasite, virus, and plant HPPPI datasets and were deployed to machine learning models, including Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Naïve Bayes (NB), Logistic Regression (LR), and Deep Forest (DF). Results: The results indicated that MPF exhibited the lowest performance overall, whereas IPF performed better with decision tree-based models, such as RF and DF. In contrast, EPF demonstrated improved performance with SVM, LR, NB, and MLP and also yielded competitive results with DF and RF. Conclusion: In conclusion, the EPF approach developed in this study exhibits substantial improvements in four out of the six models evaluated. This suggests that EPF offers competitiveness with IPF and is particularly well-suited for traditional machine learning models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction

Abstract

Talk to us

Similar Papers

More From: Current Bioinformatics

Lead the way for us

Similar Papers

Ensemble-Based Deep Learning Model for Network Traffic Classification
Ons Aouedi ... Benoit Parrein
IEEE Transactions on Network and Service Management | VOL. 19
Ons Aouedi, et. al.Ons Aouedi ... Benoit Parrein
01 Dec 2022
IEEE Transactions on Network and Service Management | VOL. 19

A low-cost breast cancer prognosis tool for developing countries using AI.
Iván Romarico González Espinoza ... Sergio Jiménez-Monferrer
Journal of Clinical Oncology | VOL. 41
Iván Romarico González Espinoza, et. al.Iván Romarico González Espinoza ... Sergio Jiménez-Monferrer
01 Jun 2023
Journal of Clinical Oncology | VOL. 41

Applying Machine Learning to Predict Esophageal Cancer Recurrence after Esophagectomy
Kevin C Kapcio ... Michal J Lada
Journal of the American College of Surgeons | VOL. 235
Kevin C Kapcio, et. al.Kevin C Kapcio ... Michal J Lada
17 Oct 2022
Journal of the American College of Surgeons | VOL. 235

1303 Prediction of best response for NSCLC patients receiving immunotherapy by machine learning models
Yili Zhang ... Adil Alaoui
Journal for ImmunoTherapy of Cancer | VOL. 10
Yili Zhang, et. al.Yili Zhang ... Adil Alaoui
01 Nov 2022
1303 Prediction of best response for NSCLC patients receiving immunotherapy by machine learning models
Yili Zhang ... Adil Alaoui

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction

Abstract

Talk to us

Similar Papers

More From: Current Bioinformatics