Abstract

Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We found striking non-RTX-motif amino acid composition patterns at the C termini, most typically exemplified by the enriched “[FLI][VAI]” at the most C-terminal two positions. Machine-learning models, including deep-learning ones, were trained using these sequence-based non-RTX-motif features and further combined into a tri-layer stacking model, T1SEstacker, which predicted the RTX proteins accurately, with a fivefold cross-validated sensitivity of ∼0.89 at the specificity of ∼0.94. Besides substrates with RTX motifs, T1SEstacker can also well distinguish non-RTX-motif T1SEs, further suggesting their potential existence of common secretion signals. T1SEstacker was applied to predict T1SEs from the genomes of representative Salmonella strains, and we found that both the number and composition of T1SEs varied among strains. The number of T1SEs is estimated to reach 100 or more in each strain, much larger than what we expected. In summary, we made comprehensive sequence analysis on the type 1 secreted RTX proteins, identified common sequence-based features at the C termini, and developed a stacking model that can predict type 1 secreted proteins accurately.

Highlights

  • Type 1 secretion systems (T1SSs) are uniquely distributed in Gram-negative bacteria, which can secrete various substrate proteins through the two bacterial cell membranes by one step or two steps into extracellular milieu (Smith et al, 2018b; Spitz et al, 2019)

  • The results suggested that there is a large variety for the composition of type 1 secreted effectors (T1SEs) in different bacterial strains, and that a T1SE homolog does not necessarily remain a T1SE since mutations in the C terminus could frequently avoid the recognition of T1SS

  • Around 100 T1SEs have been verified by experiments, and many of them contain RTX motifs nearby the C termini of protein sequences

Read more

Summary

INTRODUCTION

Type 1 secretion systems (T1SSs) are uniquely distributed in Gram-negative bacteria, which can secrete various substrate proteins through the two bacterial cell membranes by one step (classical) or two steps (non-classical) into extracellular milieu (Smith et al, 2018b; Spitz et al, 2019). The C termini of the leader peptides contain a canonical double glycine (“GG”) motif, which can be recognized and cleaved by the C39 domains of corresponding ABC transporters before the mature proteins are secreted through T1SSs (Kanonenberg et al, 2013). Different from class 1–3 T1SEs, the RTX adhesins are transported from cytoplasm to extracellular environment by a two-step secretion mechanism, which involves periplasmic intermediates This subgroup of T1SS machinery is linked with a bacterial transglutaminase-like cysteine proteinase (BTLCP) (Smith et al, 2018b). Given the evidence about the potential C-terminal secretion signals of T1SEs (Koronakis et al, 1989; Masure et al, 1990; Zhang et al, 1995; Delepelaire, 2004; Holland et al, 2005; Thomas et al, 2014), in this research, we comprehensively observed the amino acid sequence patterns, especially non-RTX-motif features within the C termini of RTX proteins, and the Sse and Acc property. We tested Deep Neural Network models and integrated them and others within a stacked model to improve the prediction performance

MATERIALS AND METHODS
A Stacked Model Featured by the Prediction Results of Individual Models
DISCUSSION
Findings
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call