Abstract

With a working copy of the human genome in hand, the hard task of making sense of the terabytes of data has begun. One way of doing that job is by comparing gene sequences with other types of biological information, including protein sequences and structures or genes identified or expressed in other creatures. The trouble is that this other information is held in hundreds of databases out in research institutions and companies around the world. They are not stored in any common format but use different syntaxes and a variety of sometimes incompatible database technologies, ranging from ad hoc systems to relational databases to object-oriented systems. But help is in the offing. Several groups have developed tools to fuse the sprawling horde into distributed-federated systems, collections of otherwise incompatible sets of data that can be searched as if they were one database. For the effort really to succeed, some bioinformatics experts say that standards must eventually be set for how data is stored and retrieved. Since data comes in different formats, queries must be targeted at each database in ways that are peculiar to each system-an irksome task for the busy biologist. Linking the databases so that they appear to be a single unit is done with software that understands the format of the associated systems and can translate queries into the syntax and schema used by individual databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call