Abstract

Various bacterial pathogens can deliver their secreted substrates also called effectors through Type III secretion systems (T3SSs) into host cells and cause diseases. Since T3SS secreted effectors (T3SEs) play important roles in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T3SSs. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to develop a novel and effective method to screen and select putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments. We develop a deep convolution neural network to directly classify any protein sequence into T3SEs or non-T3SEs, which is useful for both effector prediction and the study of sequence-function relationship. Different from traditional machine learning-based methods, our method automatically extracts T3SE-related features from a protein N-terminal sequence of 100 residues and maps it to the T3SEs space. We train and test our method on the datasets curated from 16 species, yielding an average classification accuracy of 83.7% in the 5-fold cross-validation and an accuracy of 92.6% for the test set. Moreover, when comparing with known state-of-the-art prediction methods, the accuracy of our method is 6.31-20.73% higher than previous methods on a common independent dataset. Besides, we visualize the convolutional kernels and successfully identify the key features of T3SEs, which contain important signal information for secretion. Finally, some effectors reported in the literature are used to further demonstrate the application of DeepT3. DeepT3 is freely available at: https://github.com/lje00006/DeepT3. Supplementary data are available at Bioinformatics online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call