NLRexpress-A bundle of machine learning motif predictors-Reveals motif stability underlying plant Nod-like receptors diversity.

Eliza C Martin,Laurentiu Spiridon,Andrei-José Petrescu,Aska Goverse

doi:10.3389/fpls.2022.975888

Abstract

Examination of a collection of over 80,000 Plant Nod-like receptors (NLRs) revealed an overwhelming sequence diversity underlying functional specificity of pathogen detection, signaling and cooperativity. The NLR canonical building blocks—CC/TIR/RPW8, NBS and LRR—contain, however, a number of conserved sequence motifs showing a significant degree of invariance amongst different NLR groups. To identify these motifs we developed NLRexpress—a bundle of 17 machine learning (ML)-based predictors, able to swiftly and precisely detect CC, TIR, NBS, and LRR motifs while minimizing computing time without accuracy losses—aimed as an instrument scalable for screening overall proteomes, transcriptomes or genomes for identifying integral NLRs and discriminating them against incomplete sequences lacking key motifs. These predictors were further used to screen a subset of ∼34,000 regular plant NLR sequences. Motifs were analyzed using unsupervised ML techniques to assess the structural correlations hidden underneath pattern variabilities. Both the NB-ARC switch domain which admittedly is the most conserved region of NLRs and the highly diverse LRR domain with its vastly variable lengths and repeat irregularities—show well-defined relations between motif subclasses, highlighting the importance of structural invariance in shaping NLR sequence diversity. The online NLRexpress webserver can be accessed at https://nlrexpress.biochim.ro.

Full Text