Abstract
BackgroundGenomic and genetic studies often require a target list of genes before conducting any hypothesis testing or experimental verification. With the ever-growing number of sequenced genomes and a variety of different annotation strategies, comes the potential for ambiguous gene symbols, making it cumbersome to capture the “correct” set of genes. In this article, we present and describe the Avian Immunome DB (Avimm) for easy gene property extraction as exemplified by avian immune genes. The avian immune system is characterised by a cascade of complex biological processes underlaid by more than 1000 different genes. It is a vital trait to study particularly in birds considering that they are a significant driver in spreading zoonotic diseases. With the completion of phase II of the B10K (“Bird 10,000 Genomes”) consortium’s whole-genome sequencing effort, we have included 363 annotated bird genomes in addition to other publicly available bird genome data which serve as a valuable foundation for Avimm.Construction and contentA relational database with avian immune gene evidence from Gene Ontology, Ensembl, UniProt and the B10K consortium has been designed and set up. The foundation stone or the “seed” for the initial set of avian immune genes is based on the well-studied model organism chicken (Gallus gallus). Gene annotations, different transcript isoforms, nucleotide sequences and protein information, including amino acid sequences, are included. Ambiguous gene names (symbols) are resolved within the database and linked to their canonical gene symbol. Avimm is supplemented by a command-line interface and a web front-end to query the database.Utility and discussionThe internal mapping of unique gene symbol identifiers to canonical gene symbols allows for an ambiguous gene property search. The database is organised within core and feature tables, which makes it straightforward to extend for future purposes. The database design is ready to be applied to other taxa or biological processes. Currently, the database contains 1170 distinct avian immune genes with canonical gene symbols and 612 synonyms across 363 bird species. While the command-line interface readily integrates into bioinformatics pipelines, the intuitive web front-end with download functionality offers sophisticated search functionalities and tracks the origin for each record. Avimm is publicly accessible at https://avimm.ab.mpg.de.
Highlights
Genomic and genetic studies often require a target list of genes before conducting any hypothesis testing or experimental verification
Utility and discussion: The internal mapping of unique gene symbol identifiers to canonical gene symbols allows for an ambiguous gene property search
Examples from the command-line interface (CLI) A quick overview of what evidence is available in the database is given below, using the gene IFNL3A (Interferon lambda-3 A) across all species and can be accomplished on the command line: Fig. 5 Excerpt of evidence function page
Summary
Genomic and genetic studies often require a target list of genes before conducting any hypothesis testing or experimental verification. Ever since the advent of commercial next-generation sequencing platforms in the early 2000s with its associated decrease in sequencing costs [1], the number of DNA sequences increased considerably [2] These data become publicly accessible in databases provided by projects focussing on different aspects of biological sequence information [3, 4]. Relying on accurate genome annotations and protein descriptions, Gene Ontology (GO) [8, 9] categorises gene products and fits them into a computational model of biological systems. Their assignment deploys a controlled vocabulary, so-called GO terms, to link genes and gene products to biological processes, cellular components, or molecular functions
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have