Abstract

BackgroundThe genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members. As Xanthomonas species are prevalent pathogens on a wide variety of important crops around the world, there is a need to distinguish between these plant-associated phenotypes. To date a large number of Xanthomonas genomes have been sequenced, which enables the application of machine learning (ML) approaches on the genome content to predict this phenotype. Until now such approaches to the pathogenomics of Xanthomonas strains have been hampered by the fragmentation of information regarding pathogenicity of individual strains over many studies. Unification of this information into a single resource was therefore considered to be an essential step.ResultsMining of 39 papers considering both plant-associated phenotypes, allowed for a phenotypic classification of 578 Xanthomonas strains. For 65 plant-pathogenic and 53 non-pathogenic strains the corresponding genomes were available and de novo annotated for the presence of Pfam protein domains used as features to train and compare three ML classification algorithms; CART, Lasso and Random Forest.ConclusionThe literature resource in combination with recursive feature extraction used in the ML classification algorithms provided further insights into the virulence enabling factors, but also highlighted domains linked to traits not present in pathogenic strains.

Highlights

  • The genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members

  • An assay was defined as the unique combination of a strain and host species as tested by a single source. This approach was favoured over tracking the pathogenicity of individual strains, as it enabled us to track the criteria used to determine the plant-associated phenotype of a strain

  • This yielded a total of 895 distinct pathogenicity assays, extracted from 39 studies, describing 578 unique strains that were tested on 77 different plant host species (Table 1)

Read more

Summary

Introduction

The genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members. As Xanthomonas species are prevalent pathogens on a wide variety of important crops around the world, there is a need to distinguish between these plant-associated phenotypes. To date a large number of Xanthomonas genomes have been sequenced, which enables the application of machine learning (ML) approaches on the genome content to predict this phenotype. Until now such approaches to the pathogenomics of Xanthomonas strains have been hampered by the fragmentation of information regarding pathogenicity of individual strains over many studies. Whilst non-pathogenic xanthomonads have been reported as early as 1985 [3], during the last decade many new non-pathogenic strains have been discovered [4,5,6,7,8,9] and it has become apparent that these non- Species

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call