Commercial Relational Database Management System Research Articles

THE RECENT PUBLICATION of a draft of the entire human genome (McPherson et al., 2001; Venter et al., 2001) has served to fuel an already explosive area of research in bioinformatics that is involved in deriving meaningful knowledge from proteins and DNA sequences (Alberts et al., 2002). Even with the full human genome sequence now in hand, scientists still face the challenges of determining exact gene locations and functions, observing interactions between proteins in complex molecular machines, and learning the structure and function of proteins, just to name a few. The progress of this scientific research is closely connected to the research in the database community in that analyzing large volumes of biological data sets involves being able to maintain and query large databases (Moussouni et al., 1999; Davidson, 2002). Database management systems (DBMSs) could help support life sciences applications, in a number of different ways. A partial list of tasks that such applications require is: querying large structured databases (such as sequence and graph databases), querying semi-structured (such as published manuscripts), managing data replication, querying distributed data sources, and managing parallelism in high-throughput bioinformatics. Unfortunately, current DBMSs have largely ignored supporting life sciences applications, and consequently, the life sciences researches have been forced to write tools and scripts to perform these tasks. An interesting parallel can be drawn between the state of data management tools in life sciences, and the state of data management tools for business applications, such as a banking application, about three decades ago. Prior to the advent of the relational data model, business data was managed and queried using customized programs/scripts that were developed for each application. Reusing programs, and the algorithms for querying the data, involved rewriting application program and logic, which was very time consuming and expensive. In addition, the querying programs were closely tied to the format that was used to represent the data. Any change in the format of the data representation often would break the querying programs. Furthermore, writing complex queries, such as querying over multiple data sets or posing complex analytical queries, was a daunting task. One of the critical contributions of the relational data model (Codd, 1970) was the introduction of a declarative querying paradigm for business data management, instead of the previously used procedural paradigm. In a declarative querying paradigm, the user expresses the query in a high-level language, like SQL, and the DBMS determines the best strategy for evaluating the query. In addition, the DBMS only presents to the user a logical view of the data against which queries are posed. The physical representation of the data, either on disk or in-memory, can be very different from the logical view. For example, in a relational database management system (RDBMS), indices may be created, and the user doesn’t have to query against the index. The user still queries against logical relations, and the system automatically determines if it is faster to use the indices to answer a query. The user is thus insulated from worrying about various details such as physical organization of data on disk, the exact location of the data, tuning the representation for better performance, and choosing the best plan for evaluating a query. This declarative querying paradigm has been a huge success for relational DBMSs, and today commercial RDBMSs manage terabytes of data, and allow very complex querying on these databases. Database management systems can provide similar benefits to the life sciences community, just as it did three decades ago to the business data management community. Many of the data sets that are used in life sciences are growing at an astonishing rate (such as sequence data at NCBI’s GenBank (NCBI, 2002)), and the queries

Read full abstract

Abstract The Intelligent Monitoring System (IMS) is a computer system for processing data from seismic arrays and simpler stations to detect, locate, and identify seismic events. The first operational version processes data from two high-frequency arrays (NORESS and ARCESS) in Norway. The IMS computers and functions are distributed between the NORSAR Data Analysis Center (NDAC) near Oslo and the Center for Seismic Studies (Center) in Arlington, Virginia. The IMS modules at NDAC automatically retrieve data from a disk buffer, detect signals, compute signal attributes (amplitude, slowness, azimuth, polarization, etc.), and store them in a commercial relational database management system (DBMS). IMS makes scheduled (e.g., hourly) transfers of the data to a separate DBMS at the Center. Arrival of new data automatically initiates a “knowledge-based system (KBS)” that interprets these data to locate and identify (earthquake, mine blast, etc.) seismic events. This KBS uses general and area-specific seismological knowledge represented in rules and procedures. For each event, unprocessed data segments (e.g., 7 min for regional events) are retrieved from NDAC for subsequent display and analyst review. The interactive analysis modules include integrated waveform and map display/manipulation tools for efficient analyst validation or correction of the solutions produced by the automated system. Another KBS compares the analyst and automatic solutions to mark overruled elements of the knowledge base. Performance analysis statistics guide subsequent changes to the knowledge base so it improves with experience. The IMS is implemented on networked Sun workstations, with a 56 kbps satellite link bridging the NDAC and Center computer networks. The software architecture is modular and distributed, with processes communicating by messages and sharing data via the DBMS. The IMS processing requirements are easily met with major processes (i.e., signal processing, KBS, and DBMS) on separate Sun 4/2xx workstations. This architecture facilitates expansion in functionality and number of stations. The first version was operated continuously for 8 weeks in late-1989. The Center functions were then transferred to NDAC for subsequent operation. Later versions will be distributed among NDAC, Scripps/IGPP (San Diego), and the Center to process data from many stations and arrays. The IMS design is ambitious in its integration of many new computer technologies, but the operational performance of the first version demonstrates its validity. Thus, IMS provides a new generation of automated seismic event monitoring capability.

Read full abstract

Commercial Relational Database Management System Research Articles

Related Topics

Articles published on Commercial Relational Database Management System

The Catalog Archive Server Database Management System

A Retrospection On Niche Database Technologies.

The role of declarative querying in bioinformatics.

Processing OLAP queries in hierarchically clustered databases

Semantic integrity support in SQL:1999 and commercial (object-)relational database management systems

A distributed, heterogeneous computing environment for multidisciplinary design and analysis of aerospace vehicles

An overview of the Object Protocol Model (OPM) and the OPM data management tools

Software architecture of the SSC accelerator systems string test

The Intelligent Monitoring System

A relational database system architecture based on a vector-processing method

LOCUS database system

ISIS: the interactive spatial information system

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Commercial Relational Database Management System Research Articles

Related Topics

Articles published on Commercial Relational Database Management System

The Catalog Archive Server Database Management System

A Retrospection On Niche Database Technologies.

The role of declarative querying in bioinformatics.

Processing OLAP queries in hierarchically clustered databases

Semantic integrity support in SQL:1999 and commercial (object-)relational database management systems

A distributed, heterogeneous computing environment for multidisciplinary design and analysis of aerospace vehicles

An overview of the Object Protocol Model (OPM) and the OPM data management tools

Software architecture of the SSC accelerator systems string test

The Intelligent Monitoring System

A relational database system architecture based on a vector-processing method

LOCUS database system

ISIS: the interactive spatial information system