Abstract
A precise understanding of the genomic organization into transcriptional units and their regulation is essential for our comprehension of opportunistic human pathogens and how they cause disease. Using single-molecule real-time (PacBio) sequencing we unambiguously determined the genome sequence of Streptococcus pneumoniae strain D39 and revealed several inversions previously undetected by short-read sequencing. Significantly, a chromosomal inversion results in antigenic variation of PhtD, an important surface-exposed virulence factor. We generated a new genome annotation using automated tools, followed by manual curation, reflecting the current knowledge in the field. By combining sequence-driven terminator prediction, deep paired-end transcriptome sequencing and enrichment of primary transcripts by Cappable-Seq, we mapped 1015 transcriptional start sites and 748 termination sites. We show that the pneumococcal transcriptional landscape is complex and includes many secondary, antisense and internal promoters. Using this new genomic map, we identified several new small RNAs (sRNAs), RNA switches (including sixteen previously misidentified as sRNAs), and antisense RNAs. In total, we annotated 89 new protein-encoding genes, 34 sRNAs and 165 pseudogenes, bringing the S. pneumoniae D39 repertoire to 2146 genetic elements. We report operon structures and observed that 9% of operons are leaderless. The genome data are accessible in an online resource called PneumoBrowse (https://veeninglab.com/pneumobrowse) providing one of the most complete inventories of a bacterial genome to date. PneumoBrowse will accelerate pneumococcal research and the development of new prevention and treatment strategies.
Highlights
Ceaseless technological advances have revolutionized our capability to determine genome sequences as well as our ability to identify and annotate functional elements, including transcriptional units on these genomes
We cross-checked both sequences with the genome sequence of the ancestral strain NCTC 7466 (ENA accession number ERS1022033), which was recently sequenced with SMRT technology, as part of the NCTC 3000 initiative
The ancestral strain harbors a mutation in rrlC (SPV 1814), one of four copies of the gene encoding 23S ribosomal RNA
Summary
Ceaseless technological advances have revolutionized our capability to determine genome sequences as well as our ability to identify and annotate functional elements, including transcriptional units on these genomes. Every bacterial genome present in the NCBI database was re-annotated using the so-called Prokaryotic Genome Annotation Pipeline (PGAP, [4]), with the goal of increasing the quality and consistency of the many available annotations. This Herculean effort created a more consistent set of annotations that facilitates the propagation and interpolation of scientific findings in individual bacteria to general phenomena, valid in larger groups of organisms. We generated a resource for the pneumococcal research community that contains the most up-to-date information on the D39 genome, including its DNA sequence, transcript boundaries, operon structures and functional annotation. We analyzed the genome in detail, using a combination of several different sequencing techniques and a novel, generally applicable analysis pipeline (Figure 1)
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have