Abstract

Identifying secretory proteins from blood, saliva or other body fluids has become an effective method of diagnosing diseases. Existing secretory protein prediction methods are mainly based on conventional machine learning algorithms and are highly dependent on the feature set from the protein. In this article, we propose a deep learning model based on the capsule network and transformer architecture, SecProCT, to predict secretory proteins using only amino acid sequences. The proposed model was validated using cross-validation and achieved 0.921 and 0.892 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively. Meanwhile, the proposed model was validated on an independent test set and achieved 0.917 and 0.905 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively, which are better than conventional machine learning methods and other deep learning methods for biological sequence analysis. The main contributions of this article are as follows: (1) a deep learning model based on a capsule network and transformer architecture is proposed for predicting secretory proteins. The results of this model are better than the those of existing conventional machine learning methods and deep learning methods for biological sequence analysis; (2) only amino acid sequences are used in the proposed model, which overcomes the high dependence of existing methods on the annotated protein features; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva.

Highlights

  • Human secretory proteins can enter the blood, saliva or other body fluids through various complex secretory pathways and can be used as protein markers for the detection of blood, saliva or other body fluids [1]

  • We propose an end-to-end prediction model based on a deep learning framework, which is mainly comprised of a capsule network and transformer architecture, to predict secretory proteins using only amino acid sequences

  • The results of the model are better than the existing conventional machine learning methods and deep learning methods for biological sequence analysis; (2) only amino acid sequences are used in the proposed model, which overcomes the high dependence of existing methods on the annotated protein features; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva

Read more

Summary

Introduction

Human secretory proteins can enter the blood, saliva or other body fluids through various complex secretory pathways and can be used as protein markers for the detection of blood, saliva or other body fluids [1]. The complex blood circulation system of the human body has many biomarkers that can indicate physiological conditions and disease conditions. Most current studies on biomarkers in body fluids use blood as the main research object [2]. Similar to other human body fluids, saliva is rich in biomolecules secreted from salivary glands or leaked from nearby tissues [3]. Biomolecules can be released into the blood circulatory system through various organs far away from the salivary glands into the human body and be secreted into saliva [4]. The biomolecules in saliva can reflect the health of specific organs to a certain extent, including organs both near and far away from the salivary glands

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.