Abstract

This thesis explores issues arising when predicting protein-protein interactions (PPI) involving multiple species with the Protein-protein Interaction Prediction Engine (PIPE). When predicting one species' PPI from another's, we showed that prediction performance is inversely correlated to the evolutionary distance between training and testing species. With a change in the score calculation, we improved the area under the precision-recall curve by 45% when using seven well-studied species to predict an eighth. We then showed that PIPE was able to predict PPI between species by predicting 229 novel PPI between HIV and human at an estimated precision of 82% (100:1 class imbalance). By modifying a main data structure, we also improved the speed of the PIPE algorithm by a factor of 53x when predicting H. sapiens PPI. Using these best practices, we predicted all possible PPI between soybean and its costly pest, the Soybean Cyst Nematode, for our collaborators at Agriculture Canada.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call