A Semi-Supervised Autoencoder-Based Approach for Protein Function Prediction

Richa Dhanuka,Jyoti P Singh,Anushree Tripathi

doi:10.1109/jbhi.2022.3163150

Abstract

After the development of next-generation sequencing techniques, protein sequences are abundantly available. Determining the functional characteristics of these proteins is costly and time-consuming. The gap between the number of protein sequences and their corresponding functions is continuously increasing. Advanced machine-learning methods have stepped up to fill this gap. In this work, an advanced deep-learning-based approach is proposed for protein function prediction using protein sequences. A set of autoencoders is trained in a semi-supervised manner with protein sequences. Each autoencoder corresponds to a single protein function only. In particular, 932 autoencoders corresponding to 932 biological processes and 585 autoencoders corresponding to 585 molecular functions are trained separately. Reconstruction losses of each protein sample for every autoencoder are used as a feature to classify these sequences into their corresponding functions. The proposed model is tested on test protein samples and achieves promising results. This method can be easily extended to predict any number of functions having an ample amount of supporting protein sequences. All relevant codes, data and trained models are available at https://github.com/richadhanuka/PFP-Autoencoders.

Full Text