Abstract

The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

Highlights

  • The first completed eukaryotic genome sequence was that of the budding yeast Saccharomyces cerevisiae strain S288C, completed through the effort of a worldwide sequencing consortium [1]

  • Given the raw sequence reads of a given genome, a reference genome sequence, and reference genome annotations, the pipeline generates de novo assembly scaffolds and contigs, open reading frames (ORFs) annotations including non-reference ORFs, and sequence variation calls such as additional newly inserted sequences in the genome as well as single-nucleotide polymorphisms (SNPs) relative to the reference

  • Some organisms may not have thoroughly annotated reference genomes available, AGAPE can still generate the assembly and annotation data as long as a protein database is provided for predicting gene structure. (Note: the NCBI Non-Redundant protein database can be attached to the AGAPE workflow, the speed of this annotation step is related to the number of sequences; we recommend selecting a smaller protein database that includes only those proteins that are expected to be similar to the organism of interest)

Read more

Summary

Introduction

The first completed eukaryotic genome sequence was that of the budding yeast Saccharomyces cerevisiae strain S288C, completed through the effort of a worldwide sequencing consortium [1]. With next-generation sequencing methods becoming ubiquitous, whole genomes are being analyzed en masse This has led to interesting work on the relationship between genotype and phenotype. Novo et al [7] studied a wellknown commercial winemaking strain (EC1118) and found three unique regions on three different chromosomes containing 34 genes related to key fermentation characteristics, such as metabolism and transport of sugar or nitrogen. They noted that >100 genes in the reference strain S288C are absent from the EC1118 genome. Functional genomic analysis has been undertaken in a saké yeast strain (K7), which has two large inversions and dozens of novel open reading frames (ORFs) compared to reference strain S288C [9]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call