Abstract

Identifying protein-protein interactions (PPIs) represents a challenging research problem in computational biology. Even though machine learning methods have significantly advanced the research in this field, current approaches struggle to accurately predict interactions for previously unseen proteins. Supervised autoencoders, which are neural networks trained to simultaneously predict labels and reconstruct their inputs, have been proved to offer generalization guarantees in the linear case. Moreover, the addition of a reconstruction branch was empirically shown to improve the performance of standard neural networks classifiers on several tasks. In this paper, we introduce a two-stage sequence-based PPI prediction method based on supervised autoencoders. The proposed approach consists of initially training a denoising autoencoder on protein sequences, followed by a supervised training stage in which the model learns to both predict whether two proteins interact and to reconstruct the two proteins in the pair. An experimental analysis was performed on two public PPI data sets containing testing pairs formed using both seen and unseen protein sequences. The results show that our approach surpasses, on the two data sets, multiple machine learning classifiers proposed in the literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call