Legionella pneumophila is an environmental bacterium and clinical pathogen that causes many life-threating outbreaks of an atypical pneumonia called Legionnaires' disease (LD). Studies of this pathogen have focused mainly on Europe and the United States. A shortage in L. pneumophila data is clearly observed for developing countries. To reduce this knowledge gap, L. pneumophila isolates were studied in two widely different geographical areas, i.e., the West Bank and Germany. For this study, we sequenced and compared the whole genome of 38 clinical and environmental isolates of L. pneumophila covering different MLVA-8(12) genotypes in the two areas. Sequencing was conducted using the Illumina HiSeq 2500 platform. In addition, two isolates (A194 and H3) were sequenced using a Pacific Biosciences (PacBio) RSII platform to generate complete reference genomes from each of the geographical areas. Genome sequences from 55 L. pneumophila strains, including 17 reference strains, were aligned with the genome sequence of the closest strain (L. pneumophila strain Alcoy). A whole genome phylogeny based on single nucleotide polymorphisms (SNPs) was created using the ParSNP software v 1.0. The reference genomes obtained for isolates A194 and H3 consisted of circular chromosomes of 3,467,904 bp and 3,691,263 bp, respectively. An average of 36,418 SNPs (min. 8569, max. 70,708 SNPs) against our reference strain L. pneumophila str. Alcoy, and 2367 core-genes were identified among the fifty-five strains. An analysis of the genomic population structure by SNP comparison divided the fifty-five L. pneumophila strains into six branches. Individual isolates in sub-lineages in these branches differed by less than 120 SNPs if they had the same MLVA genotype and were isolated from the same location. A bioinformatics analysis identified the genomic islands (GIs) for horizontal gene transfer and mobile genetic elements, demonstrating that L. pneumophila showed high genome plasticity. Four L. pneumophila isolates (H3, A29, A129 and L10-091) contained well-defined plasmids. On average, only about half of the plasmid genes could be matched to proteins in databases. In silico phage findings suggested that 43 strains contained at least one phage. However, none of them were found to be complete. BLASTp analysis of proteins from the type IV secretion Dot/Icm system showed those proteins highly conserved, with less than 25% structural differences in the new L. pneumophila isolates. Overall, we demonstrated that whole genome sequencing provides a molecular surveillance tool for L. pneumophila at the highest conceivable discriminatory level, i.e., two to eight SNPs were observed for isolates from the same location but several years apart.
Read full abstract