Abstract

Gram-negative bacteria use various secretion systems to deliver their secreted effectors. Among them, type IV secretion system exists widely in a variety of bacterial species, and secretes type IV secreted effectors (T4SEs), which play vital roles in host-pathogen interactions. However, experimental approaches to identify T4SEs are time- and resource-consuming. In the present study, we aim to develop an in silico stacked ensemble method to predict whether a protein is an effector of type IV secretion system or not based on its sequence information. The protein sequences were encoded by the feature of position specific scoring matrix (PSSM)-composition by summing rows that correspond to the same amino acid residues in PSSM profiles. Based on the PSSM-composition features, we develop a stacked ensemble model PredT4SE-Stack to predict T4SEs, which utilized an ensemble of base-classifiers implemented by various machine learning algorithms, such as support vector machine, gradient boosting machine, and extremely randomized trees, to generate outputs for the meta-classifier in the classification system. Our results demonstrated that the framework of PredT4SE-Stack was a feasible and effective way to accurately identify T4SEs based on protein sequence information. The datasets and source code of PredT4SE-Stack are freely available at http://xbioinfo.sjtu.edu.cn/PredT4SE_Stack/index.php.

Highlights

  • Gram-negative bacteria use various secretion systems to deliver their secreted substrates from the bacterial cytosol into host cells, which can promote virulence and cause diseases

  • Our experimental results on both cross validation and independent tests demonstrated that the framework of PredT4SE-Stack is a feasible and effective way to accurately identify T4SEs based on protein sequence information

  • The fact that the AUCROC of SVM is higher than that of XGB and Gradient Boosting Machine (GBM) indicates that SVM can achieve more stable performance than XGB and GBM using position specific scoring matrix (PSSM)-composition feature as input in the present task, in regardless of the change of the thresholds

Read more

Summary

Introduction

Gram-negative bacteria use various secretion systems to deliver their secreted substrates ( called as effectors) from the bacterial cytosol into host cells, which can promote virulence and cause diseases. Predicting Type IV Secreted Effectors secretes type IV secreted effectors (T4SEs), which vary widely across bacterial species. Accurate and reliable identification of T4SEs is a crucial step toward the understanding of the pathogenic mechanism of T4SS. Due to the biological significance of T4SEs, a number of experimental approaches have been developed to identify novel T4SEs such as fusion protein report assays and secretion apparatus. It is highly desirable to develop in silico classification models to accurately predict type IV secreted effectors of T4SS based on protein sequence information

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call