Reciprocal Perspective: A Cascaded Semi-Supervised Machine Learning Framework to Improve Pairwise Classification &amp; Regression

Kevin Dick

doi:10.22215/etd/2022-15185

Abstract

Many real-world problems can be represented as a network, with nodes representing elements and the edges (i.e. links; or lack thereof) capturing the relationship between elements. An example domain that leverages link prediction algorithms to elucidate relationships between pairs of nodes is the task of protein-protein interaction (PPI) prediction. Leveraging high-performance computing and optimized PPI predictors, it is recently possible to evaluate every possible combination of paired nodes enabling the generation of a comprehensive prediction matrix (CPM). We introduce a novel semi-supervised machine learning method, denoted Reciprocal Perspective (RP), which leverages this new wealth of information by extracting context-based features from this CPM by considering reciprocal views of pairwise elements for use in a cascaded classifier which has demonstrated significant improvement in predictive performance. Historically, this achievable wealth of information has been ignored due to computational intractability. We demonstrate that expending compute resources to generate CPMs is a worthy investment given the improvement in predictive performance in both classification- and regression-type tasks. This thesis makes contributions at all stages of a prototypical prediction pipeline. We demonstrate that RP is applicable to a variety of application domains within bioinformatics (PPI, microRNA-target, and drug-target interaction prediction) as well as within Network Science with Recommendation Systems. Furthermore, RP is demonstrated to improve individual model performance as well as function as an ensemble method to combine multiple experts. Taken together, these contributions demonstrate that RP can be broadly applied for pairwise prediction problems across different domains, problem formulations, and varying scales of data.

Full Text