Abstract

Species living in extremely cold environments resist the freezing conditions through antifreeze proteins (AFPs). Apart from being essential proteins for various organisms living in sub-zero temperatures, AFPs have numerous applications in different industries. They possess very small resemblance to each other and cannot be easily identified using simple search algorithms such as BLAST and PSI-BLAST. Diverse AFPs found in fishes (Type I, II, III, IV and antifreeze glycoproteins (AFGPs)), are sub-types and show low sequence and structural similarity, making their accurate prediction challenging. Although several machine-learning methods have been proposed for the classification of AFPs, prediction methods that have greater reliability are required. In this paper, we propose a novel machine-learning-based approach for the prediction of AFP sequences using latent space learning through a deep auto-encoder method. For latent space pruning, we use the output of the auto-encoder with a deep neural network classifier to learn the non-linear mapping of the protein sequence descriptor and class label. The proposed method outperformed the existing methods, yielding excellent results in comparison. A comprehensive ablation study is performed, and the proposed method is evaluated in terms of widely used performance measures. In particular, the proposed method demonstrated a high Matthews correlation coefficient of 0.52, F-score of 0.49, and Youden’s index of 0.81 on an independent test dataset, thereby outperforming the existing methods for AFP prediction.

Highlights

  • In Antarctic fish, a survival mechanism that prevented them from freezing in seawater at sub-zero temperatures was observed, which led to the discovery of antifreeze proteins (AFP)[1]

  • We used early stopping with the patience of 50 epochs to avoid overfitting, and we stopped the training if the model stopped improving

  • Considering the significance of the latent variables, in this study, we evaluated the models with varying number of latent space variables

Read more

Summary

Introduction

In Antarctic fish, a survival mechanism that prevented them from freezing in seawater at sub-zero temperatures was observed, which led to the discovery of antifreeze proteins (AFP)[1]. For instance the sub-types www.nature.com/scientificreports of AFPs found in fishes namely Type I, II, III, IV and AFGP15, have no significant similarities in structures and sequences; rather, they demonstrate some homology to different protein families from which they are assumed to have evolved[18,19] This inconsistency makes their in-silico identification using conventional search tools such as BLAST20 and PSI-BLAST21 unfavorable and increases the complexity of the development of a reliable prediction model due to the lack of common features. Kandaswamy et al proposed a framework named AFP-Pred, which is considered to be a pioneering work in this direction, to utilize machine learning[22] In this method, a feature vector containing 119 attributes was obtained by encoding each sequence, from which dominant features were selected using the ReliefF approach to train the random forest (RF) classifier.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.