Abstract

BackgroundThe pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq.ResultsPanseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset.ConclusionPanseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs.AvailabilityPanseq is freely available online at http://76.70.11.198/panseq. Panseq is written in Perl.

Highlights

  • The pan-genome of a bacterial species consists of a core and an accessory gene pool

  • We have used a number of examples to highlight the functionality of Panseq, many of which could be carried forward as complete studies of their own; our intention is to demonstrate that Panseq is capable of finding and extracting useful data from sequences, which can be used as the basis for hypothesis generation and future investigations

  • (1) Novel Region Finder (NRF) Module Alignment programs are capable of finding regions of similarity between sequences, and regions of uniqueness can be inferred from the gaps between areas with high sequence similarity

Read more

Summary

Introduction

The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. The pace of genome data growth continues to increase as the cost to acquire the data continues to decrease This has been led in large part by massively parallel sequencing platforms such as the 454 Genome Sequencer FLX (Roche Applied Science), the Illumina (Solexa) Genome Analyzer and the ABI SOLiD System (Applied Biosystems), which generate tens of millions of base pairs of information in short reads 30 to several hundred base pairs in length [1,2]. These reads must be combined into large contiguous DNA sequences by dedicated software such as Newbler (Roche) and MAQ [2]. Sequencing efforts are primarily driven by the discovery of novel genes and, as gap closure is time-consuming and expensive, many researchers use un-finished draft sequences of genomes in their analyses [4]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.