Department Of Energy Joint Genome Institute Research Articles

BackgroundThe emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.Methodology/Principal FindingsIn this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.ConclusionThese data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

Read full abstract

Background The International Year of Forests 2011 [http://www. un.org/en/events/iyof2011/] will be a milestone for forest tree genomics. The draft genome sequence of Eucalyptus grandis was released in January 2011 in the USA (Phytozome [http://www.phytozome.net]) and in Belgium (BOGAS, [http://bioinformatics.psb.ugent.be/webtools/bogas/]). The genome sequencing was funded by the US Department of Energy (DOE) and performed at the DOE Joint Genome Institute (JGI) in collaboration with members of the Eucalyptus Genome Network (EUCAGEN, [http://www.eucagen.org]) who contributed genetic materials, linkage maps, EST resources and bioinformatics support. The E. grandis genome together with that of Populus trichocarpa[1]and other woody plant genomes recently completed (e.g. Vitis, Cacao, Prunus, Citrus and Malus)will provide excellent opportunities for comparative studies of the unique biology of woody plants. Eucalypts are currently the most widely grown hardwood fibre crop in the world and eucalypt breeding programs will benefit greatly from the new genomic resources. The reference genome sequence of Eucalyptus, a foundation tree genus in Australia comprising more than 70% of the native forest estate, will also offer important benefits for ecological and evolutionary biology studies. We report the sequencing, assembly and annotation of the E. grandis genome. Genome sequencing and assembly Whole-genome (8X) shotgun sequencing was performed for a partially inbred (S1), 17-year-old tree of E. grandis (est. genome size 640 Mbp, n = 11), BRASUZ1 (Suzano, Brazil). A total of 7.7 million Sanger reads (5.4 Gbp) were produced from plasmid, fosmid and BAC libraries. An inbred genotype was selected to circumvent perceived problems with the assembly of a highly heterozygous eucalypt genome. However, microsatellite genotyping showed that BRASUZ1 was much less homozygous than expected, with large parts of the genome remaining heterozygous presumably due to viability selection. This finding was confirmed during the assembly of the S1 genome approximately 25% of the assembly occurred in two haplotypes of 3-4X coverage, while the remainder of the genome assembled into a single haplotype of 6-7X coverage. Linkage maps with over 2400 DArT and microsatellite markers were subsequently used as a framework for the assembly of 11 large chromosome scaffolds. The chromosome scaffolds contained 88% (605 Mbp) of the draft assembly, with the remainder of the assembly sequence (85 Mbp) in 4941 smaller scaffolds. Based on similarity searches with 1.6 million ESTs from BRASUZ1, it was estimated that 96% of expressed gene loci were included in the 11 chromosome assemblies.

Read full abstract

Department Of Energy Joint Genome Institute Research Articles

Articles published on Department Of Energy Joint Genome Institute

Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9.

Genome Sequence of Bacillus subtilis natto VK161, a Novel Strain That Produces Vitamin K2.

The genome portal of the Department of Energy Joint Genome Institute: 2014 updates

The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation

The Genome Portal of the Department of Energy Joint Genome Institute

The Eucalyptus grandisGenome Project: Genome and transcriptome resources for comparative analysis of woody plant biology

Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

SoyDB: a knowledge database of soybean transcription factors.

Energy Department Expanding Microbial Genome Efforts

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Department Of Energy Joint Genome Institute Research Articles

Articles published on Department Of Energy Joint Genome Institute

Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9.

Genome Sequence of Bacillus subtilis natto VK161, a Novel Strain That Produces Vitamin K2.

The genome portal of the Department of Energy Joint Genome Institute: 2014 updates

The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation

The Genome Portal of the Department of Energy Joint Genome Institute

The Eucalyptus grandisGenome Project: Genome and transcriptome resources for comparative analysis of woody plant biology

Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

SoyDB: a knowledge database of soybean transcription factors.

Energy Department Expanding Microbial Genome Efforts