Abstract

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Highlights

  • Nucleic Acids Research, 2021, Vol 49, Database issue D917 advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of long non-coding RNA (lncRNA). Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org

  • GENCODE produces widely-used reference genome annotation of protein-coding and non-coding loci including alternatively spliced transcripts and pseudogenes for the human and mouse genomes and makes these annotations freely available for the benefit of biomedical research and genome interpretation

  • Nucleic Acids Research, 2021, Vol 49, Database issue D917 advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of long non-coding RNA (lncRNA)

Read more

Summary

Introduction

Nucleic Acids Research, 2021, Vol 49, Database issue D917 advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org. GENCODE produces widely-used reference genome annotation of protein-coding and non-coding loci including alternatively spliced transcripts and pseudogenes for the human and mouse genomes and makes these annotations freely available for the benefit of biomedical research and genome interpretation.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call