DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

Brad T Sherman,H Clifford Lane,Da Wei Huang,Robert Stephens,Stephan Bour,Richard A Lempicki,Michael W Baseler,Qina Tan,David Liu,Yongjian Guo

doi:10.1186/1471-2105-8-426

Abstract

BackgroundDue to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis.DescriptionThe DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner.ConclusionThe DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at .

Highlights

Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups
For the example Affymetrix list, 10–20% more GO terms are enriched in the Database for Annotation (DAVID) Knowledgebase compared to each of the individual resources (e.g. Entrez Gene) (Figure 5)
The DAVID Gene Concept agglomerates diverse types of gene identifiers belonging to the same gene into one gene cluster

Summary

Introduction

Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Description: The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters The grouping of such identifiers improves the cross-reference capability, across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. An integrated gene-annotation database with comprehensive data coverage is essential as the first step of any high-throughput gene functional analytic algorithm. Some integrated databases, such as NCBI Entrez Gene [1], UniProt [2], PIR [3], etc., made great efforts to integrate annotation resources in one centralized location and are considered to be the world-class bioinformatics foundation for general bioinformatics purposes. Each of the tools requires a large amount of redundant efforts to build its own backend database from public resources

Objectives

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 2, 2007
Citations: 518	License type: cc-by

R Discovery Prime

R Discovery Prime

DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Extracting Biological Meaning from Large Gene Lists with DAVID
Da Wei Huang ... Tomozumi Imamichi
Current Protocols in Bioinformatics | VOL. 27
Da Wei Huang, et. al.Da Wei Huang ... Tomozumi Imamichi
01 Sep 2009
Current Protocols in Bioinformatics | VOL. 27

Surface manipulation of biomolecules for cell microarray applications
Andrew L Hook ... Nicolas H Voelcker
Trends in Biotechnology | VOL. 24
Andrew L Hook, et. al.Andrew L Hook ... Nicolas H Voelcker
17 Aug 2006
Trends in Biotechnology | VOL. 24

CRISPR/Cas9 somatic multiplex-mutagenesis for high-throughput functional cancer genomics in mice
Julia Weber ...
Proceedings of the National Academy of Sciences | VOL. 112
Julia Weber, et. al.Julia Weber ...
27 Oct 2015
Proceedings of the National Academy of Sciences | VOL. 112

Generation of Radiation-Induced Deletion Complexes in the Mouse Genome Using Embryonic Stem Cells
Yun You ... John C Schimenti
Methods | VOL. 13
Yun You, et. al.Yun You ... John C Schimenti
01 Dec 1997
Methods | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics