Abstract

THE ADAGE “Think global, act local,” exhorts individuals to change the world starting with actions within ones local community. The research challenges for data management for molecular and cell biology can be similarly addressed. Local communities have their own cultures, needs, and priorities. This is as true for communities involved in the various knowledge domains of molecular and cell biology (e.g., genomics, structural biology, molecular phylogeny, pharmacology) as it is for geopolitical communities. In the global research community encompassing the many molecular and cell biology knowledge domains we need to communicate information in order to integrate data on a gene’s chromosomal location, the structure of the protein that the gene encodes, and the small molecules that bind to that protein. Communication requires a common standard not only at the syntactic level but also at the semantic level. A top down approach of imposing a single standard for all molecular and cell biology knowledge domains is impractical if not impossible to enforce. A bottom up approach of each knowledge domain generating its own standards would take advantage of the work already done in those local communities and best utilize the expertise within those communities. Standards for protein structure have been generated through the efforts of the Research Collaboratory for Structural Bioinformatics (RCSB) and the Protein Data Bank (Westbrook, 2002). Standards for assigning and describing molecular function, biological processes, and cellular component have been established through the Gene Ontology (GO) Consortium (Ashburner, 2000). Microarray standards have been established through the efforts of the Microarray Gene Expression Data (MGED) Society (Brazma, 2001; Spellman, 2002). While these and other standards exist, certainly they are not available for all molecular and cell biology knowledge domains. The challenges for a bottom-up approach to generating a common standard for molecular and cell biology data management are to first establish standards in the various communities or knowledge domains comprising this broadly-defined area of research and secondly to have these different standards work together. These challenges are related in that the standards that are developed at the community or local level must be compatible at the global level. To think global in this sense is to have local standards efforts (within knowledge domains) look to and learn from existing standards efforts. The existing standards efforts to consider should not be restricted to molecular and cell biology (e.g., GO) but should also include standards effort in the mediums used by molecular and cell biologists, computational biologists and bioinformaticists. Examples include the standards efforts in computer industry specifications by the Object Management Group (OMG, www.omg.com/), in bioinformatics programming by the Open Bioinformatics Foundation (http://open-bio.org/), and in web technologies by the World Wide Web Consortium (W3C, www.w3.org/). The challenge of building local data standards thus includes consciousness raising of other efforts and their relevance. If the “think global, act local” approach is taken, then ultimately data managers for molecular and cell biology will share common objects. Objects are defined here as instances of some concept (class). Common objects are ones that can be shared and understood between data systems. Despite the popularity of relational data management systems and indexed text files, most computational biology and bioinformatics

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call