Abstract

Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.

Highlights

  • The type III secretion systems (T3SSs) are sophisticated protein transport nanomachines that are widely distributed among diverse Gram-negative pathogenic bacteria, including the causative agents of devastating human diseases, such as plague, typhoid fever, and dysentery [1]

  • We presented a novel predictor, called iT3SE-PX, which further extracted more informative features solely from the position-specific scoring matrix (PSSM) profile to improve the prediction of T3SEs with the help of a powerful feature selection technique

  • Despite a dramatic increase in the number of available whole-genome sequences, accurate prediction of T3SEs still remains a challenging problem in bioinformatics

Read more

Summary

Introduction

The type III secretion systems (T3SSs) are sophisticated protein transport nanomachines that are widely distributed among diverse Gram-negative pathogenic bacteria, including the causative agents of devastating human diseases, such as plague, typhoid fever, and dysentery [1]. Using T3SSs, a variety of virulence proteins are secreted and translocated into host cells, in which they exert a number of effects that help the pathogen to survive and to escape an immune response. These virulence proteins are called type III secreted effectors (T3SEs), which can cause a sequence of changes in host cells, including the subversion of host defences and the modulation of signal transduction pathways [2]. With the development of highthroughput sequencing technology and rapid increase of protein sequence data, there is a growing demand to explore cost-effective computational methods to predict putative T3SEs solely based on their primary sequences

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call