Abstract
The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori.
Highlights
Helicobacter pylori is an important human pathogen, infecting more than 50% of the world’s population [1]
Prior to analysis of data mapped to the reference genome or analysis of de novo assemblies, we analysed the raw sequence data to determine the overall quality of the Ion Torrent data
H. pylori is fascinating in its allelic diversity with 1456 unique multi-locus sequence typing (MLST) combinations in 1551 analysed isolates
Summary
Helicobacter pylori is an important human pathogen, infecting more than 50% of the world’s population [1]. It is microaerophilic, flagellated and gram-negative and is generally transmitted vertically from mother to child in the early stages of life, colonising and persisting in the gastric mucosa unless treated. Two unrelated genome sequences were published in 1997 (26695) and 1999 (J99), detailing two similar, compact and low GC genomes [3,4]. These genomes have become standard laboratory reference genomes. 6–7% of genes were unique to each strain (most of which were encoded on a hypervariable region) but the overall genomic organisation and predicted proteomes were similar, despite the expectation of high allelic diversity [4]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.