Internet Comparative Mapping Resources.

M J Wakefield

doi:10.1093/ilar.39.2-3.66

Abstract

Databases accessible over the Internet have become an essential tool in the study of genome organization. Recent growth in the field of genomics has resulted in the study of an increasing number of organisms in addition to a flood of data from the more intensively studied organisms. Although this rapid growth in the quantity of data has greatly increased the likelihood that data useful to an individual researcher will be available, the job of finding that data in the traditional journal publication system is significantly more difficult. To increase the accessibility of these data, several databases have been established and made available via the Internet. This availability allows the rapid access of current, in-depth data not practical for journal publications, with the additional benefit of enabling searchers to extract the data based on their own criteria. Genome databases have been established on an organism-specific basis, usually with a proprietary database structure and delivery system. The largest of these databases are the Human Genome Organization's genome database at Oak Ridge National Laboratory (Oak Ridge, Tennessee) and the Mouse Genome Database at The Jackson Laboratory (Bar Harbor, Maine). Many of the species groups are now migrating to a common database platform; and several species groups including those working on pig, cattle, sheep, horse, chicken, and cat are now using the ARKdb platform developed and run at the Roslin Institute in Edinburgh, Scotland. This usage of a common system holds great promise for the integration and cross-referencing of data for these species. At the time of this writing, the status and direction of the largest genome database—the human genome database—are uncertain due to the termination of funding for its development. The database will be hosted in its current form at Oak Ridge National Laboratory with no additional development, representing a shift in the focus from geneand clone-based information to a model of annotated sequence. This shift is likely to present some difficulties for comparative mapping in the short term. Comparative mapping data are integrated into many of the species-specific databases in several ways. The basic type of comparative linkage comprises shared symbols and names and includes the use of symbols from other species as an alias. This type of linkage has many flaws, including inflexibility with regard to our evolving understanding of gene homologies and sensitivity to nomenclature changes. The more advanced comparative information is obtained from direct links by unique identifiers to entries in other databases and in some cases dedicated detailed comparative databases. The most comprehensive comparative database available as of October 1998 is the Mouse Genome Database (mammalian homology database), which lists homologies in a broad range of species as given in published journal articles (with references) and includes a powerful set of search criteria in a forms interface with tabular or graphical output. The main limitations of all current comparative mapping resources are the manual nature of establishing homology links and the typical absence of database entries that indicate the evidence for the homology. The next wave of genome database innovation will be the development of automated software agents, which will explore databases and create their own database of links between the different species databases based on a defined set of homology criteria. One of these agent systems, under development at the time of this writing at the Roslin Institute and the University of Edinburgh, will hopefully result in the efficient creation of current, robust, and well-defined homology links between species databases. Although comparative gene mapping databases are continuing to evolve, Internet-based resources have become an entrenched and vital tool for the comparative mapper. Use of this tool can only grow in importance as the data sets involved continue to expand and as more sophisticated tools are developed for their exploitation.

Full Text