Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms

Jochen T Bick,Shuqin Zeng,Susanne E Ulbrich,Mark D Robinson,Stefan Bauersachs

doi:10.1093/database/baz086

Abstract

Next-generation sequencing technologies and the availability of an increasing number of mammalian and other genomes allow gene expression studies, particularly RNA sequencing, in many non-model organisms. However, incomplete genome annotation and assignments of genes to functional annotation databases can lead to a substantial loss of information in downstream data analysis. To overcome this, we developed Mammalian Annotation Database tool (MAdb, https://madb.ethz.ch) to conveniently provide homologous gene information for selected mammalian species. The assignment between species is performed in three steps: (i) matching official gene symbols, (ii) using ortholog information contained in Ensembl Compara and (iii) pairwise BLAST comparisons of all transcripts. In addition, we developed a new tool (AnnOverlappeR) for the reliable assignment of the National Center for Biotechnology Information (NCBI) and Ensembl gene IDs. The gene lists translated to gene IDs of well-annotated species such as a human can be used for improved functional annotation with relevant tools based on Gene Ontology and molecular pathway information. We tested the MAdb on a published RNA-seq data set for the pig and showed clearly improved overrepresentation analysis results based on the assigned human homologous gene identifiers. Using the MAdb revealed a similar list of human homologous genes and functional annotation results regardless of whether starting with gene IDs from NCBI or Ensembl. The MAdb database is accessible via a web interface and a Galaxy application.

Highlights

In transcriptomics and proteomics studies, one important step of data analysis is the functional annotation of obtained lists of differentially expressed genes (DEGs) or proteins (DEPs)
In the first step of assigning orthologs, HUGO Gene Nomenclature Committee (HGNC) gene symbols were compared between species to assign the corresponding Entrez Gene IDs
The Ensembl Compara database and results from Basic Local Alignment Search Tool (BLAST) comparisons of the transcriptomes were used to increase the number of assigned orthologs

Summary

Introduction

In transcriptomics and proteomics studies, one important step of data analysis is the functional annotation of obtained lists of differentially expressed genes (DEGs) or proteins (DEPs). Depending on the status of the gene annotation of a species, not all annotated genes have an official gene symbol (only locus number, e.g. LOC100152218 60S ribosomal protein L23a-like) and/or are assigned to functional annotation databases like their corresponding orthologs in the well-annotated model organisms. This leads to a substantial loss of information if the gene identifiers (IDs) of the respective species are used for functional annotation. To avoid this data loss and improve the results of functional annotation, one strategy is to transfer information from homologous genes (orthologs and paralogs) of well-annotated species [6]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Jan 1, 2019
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

Characteristic bioanalysis of skeletal muscle cells gene markers in septic patients
Kuo Tian ... Chen Xu
Zhonghua wei zhong bing ji jiu yi xue | VOL. 31
Kuo Tian, et. al.Kuo Tian ... Chen Xu
01 Mar 2019
Zhonghua wei zhong bing ji jiu yi xue | VOL. 31

Entrez Gene: gene-centered information at NCBI
D Maglott ... K D Pruitt
Nucleic Acids Research | VOL. 39
D Maglott, et. al.D Maglott ... K D Pruitt
28 Nov 2010
Nucleic Acids Research | VOL. 39

The Edinburgh human metabolic network reconstruction and its functional analysis
Hongwu Ma ... Evgeni Selkov
Molecular Systems Biology | VOL. 3
Hongwu Ma, et. al.Hongwu Ma ... Evgeni Selkov
01 Jan 2007
Molecular Systems Biology | VOL. 3

Development of a panel of unigene-derived polymorphic EST–SSR markers in lentil using public database information
Debjyoti Sen Gupta ... Rebecca J Mcgee
The Crop Journal | VOL. 4
Debjyoti Sen Gupta, et. al.Debjyoti Sen Gupta ... Rebecca J Mcgee
26 Jul 2016
The Crop Journal | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database