Annotation of 2,507 Saccharomyces cerevisiae genomes.

Meng Wang,Xiaoping Hou,Shumin Hu,Xian Liu,Hua Yin,Yang He,Bin-Bin Xie,Xuan Li,Jun-Hong Yu

doi:10.1128/spectrum.03582-23

Meng Wang, Xiaoping Hou + Show 7 more

Open Access

https://doi.org/10.1128/spectrum.03582-23

Copy DOI

Journal: Microbiology Spectrum	Publication Date: Apr 2, 2024
Citations: 1	License type: CC BY 4.0

Affiliation: Shandong University

Abstract

Saccharomyces cerevisiae (baker's yeast, budding yeast) is one of the most important model organisms for biological research and is a crucial microorganism in industry. Currently, a huge number of Saccharomyces cerevisiae genome sequences are available at the public domain. However, these genomes are distributed at different websites and a large number of them are released without annotation information. To provide one complete annotated genome data resource, we collected 2,507 Saccharomyces cerevisiae genome assemblies and re-annotated 2,506 assemblies using a custom annotation pipeline, producing a total of 15,407,164 protein-coding gene models. With a custom pipeline, all these gene sequences were clustered into families. A total of 1,506 single-copy genes were selected as marker genes, which were then used to evaluate the genome completeness and base qualities of all assemblies. Pangenomic analyses were performed based on a selected subset of 847 medium-high-quality genomes. Statistical comparisons revealed a number of gene families showing copy number variations among different organism sources. To the authors' knowledge, this study represents the largest genome annotation project of S. cerevisiae so far, providing rich genomic resources for the future studies of the model organism S. cerevisiae and its relatives.IMPORTANCESaccharomyces cerevisiae (baker's yeast, budding yeast) is one of the most important model organisms for biological research and is a crucial microorganism in industry. Though a huge number of Saccharomyces cerevisiae genome sequences are available at the public domain, these genomes are distributed at different websites and most are released without annotation, hindering the efficient reuse of these genome resources. Here, we collected 2,507 genomes for Saccharomyces cerevisiae, performed genome annotation, and evaluated the genome qualities. All the obtained data have been deposited at public repositories and are freely accessible to the community. This study represents the largest genome annotation project of S. cerevisiae so far, providing one complete annotated genome data set for S. cerevisiae, an important workhorse for fundamental biology, biotechnology, and industry.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Annotation of 2,507 Saccharomyces cerevisiae genomes.

Abstract

Talk to us

Similar Papers

More From: Microbiology Spectrum

Lead the way for us

Similar Papers

A Proteogenomic Survey of the Medicago truncatula Genome
Jeremy D Volkening ... Michael R Sussman
Molecular & Cellular Proteomics | VOL. 11
Jeremy D Volkening, et. al.Jeremy D Volkening ... Michael R Sussman
01 Oct 2012
Molecular & Cellular Proteomics | VOL. 11

Allele-defined genome reveals biallelic differentiation during cassava evolution
Wei Hu ...
Molecular Plant | VOL. 14
Wei Hu, et. al.Wei Hu ...
15 Apr 2021
Molecular Plant | VOL. 14

Pairwise Ortholog Detection in Related Yeast Species by Using Big Data Supervised Classifications
Guillermin Agüero-Chapin ... Evys Ancede Gallardo
-
Guillermin Agüero-Chapin, et. al.Guillermin Agüero-Chapin ... Evys Ancede Gallardo
02 Dec 2015
02 Dec 2015

Submission of Microarray Data to Public Repositories
Catherine A Ball ... Ronald Taylor
PLoS Biology | VOL. 2
Catherine A Ball, et. al.Catherine A Ball ... Ronald Taylor
31 Aug 2004
PLoS Biology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Annotation of 2,507 Saccharomyces cerevisiae genomes.

Abstract

Talk to us

Similar Papers

More From: Microbiology Spectrum