Abstract

Proteins can move from blood circulation into salivary glands through active transportation, passive diffusion or ultrafiltration, some of which are then released into saliva and hence can potentially serve as biomarkers for diseases if accurately identified. We present a novel computational method for predicting salivary proteins that come from circulation. The basis for the prediction is a set of physiochemical and sequence features we found to be discerning between human proteins known to be movable from circulation to saliva and proteins deemed to be not in saliva. A classifier was trained based on these features using a support-vector machine to predict protein secretion into saliva. The classifier achieved 88.56% average recall and 90.76% average precision in 10-fold cross-validation on the training data, indicating that the selected features are informative. Considering the possibility that our negative training data may not be highly reliable (i.e., proteins predicted to be not in saliva), we have also trained a ranking method, aiming to rank the known salivary proteins from circulation as the highest among the proteins in the general background, based on the same features. This prediction capability can be used to predict potential biomarker proteins for specific human diseases when coupled with the information of differentially expressed proteins in diseased versus healthy control tissues and a prediction capability for blood-secretory proteins. Using such integrated information, we predicted 31 candidate biomarker proteins in saliva for breast cancer.

Highlights

  • Human blood has long been used as an information source for detection of human diseases such as liver enzymes for detecting hepatitis, white-blood cell counts for infection detection and prostate-specific antigen (PSA) for diagnosing prostate cancer

  • This study is on the development of a computational method for identification of the distinct features of salivary proteins that come from circulation and an application of the identified features to predict proteins that can get into saliva from circulation

  • We have assessed the contributions by the 55 feature elements to the classification accuracy, using a statistical significance q-value [9], and found that the q-values for the 55 feature elements are less than 4.0E-5, as shown in Performance of the support vector machine (SVM) model Based on the 55 selected feature elements, we trained a classifier and evaluated the performance using 10-fold cross validation by repeating the prediction 100 times to derive a performance distribution of the classifier

Read more

Summary

Introduction

Human blood has long been used as an information source for detection of human diseases such as liver enzymes for detecting hepatitis, white-blood cell counts for infection detection and prostate-specific antigen (PSA) for diagnosing prostate cancer. Recent largescale proteomic analyses have revealed that human saliva is rich in proteins [1], some of which come from the blood circulation and can potentially serve as a general information pool for disease biomarker identification. This study is on the development of a computational method for identification of the distinct features of salivary proteins that come from circulation and an application of the identified features to predict proteins that can get into saliva from circulation. While a few salivary proteins have been found to be relevant to specific diseases, there has not been a general and effective approach for identifying disease markers in saliva, to the best of our knowledge

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.