GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

Paolo Pannarale,Domenico Catalano,Francesco Rubino,Gaetano Scioscia,Pietro Leo,Giorgio De Caro,Giorgio Grillo,Graziano Pappadà,Flavio Licciulli

doi:10.1186/1471-2105-13-s4-s4

Paolo Pannarale, Domenico Catalano + Show 7 more

Open Access

https://doi.org/10.1186/1471-2105-13-s4-s4

Copy DOI

Abstract

BackgroundIn the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database.MethodsThe GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology.The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing.Results and conclusionsEntries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS.

Highlights

In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies
Several informatics efforts have been made to integrate molecular data stored in the National Center for Biotechnology Information (NCBI) [5] with other molecular public databases, or with public biodiversity data resources, as in the case of Global Biodiversity Information Facility (GBIF) [6], where the primary biodiversity data are correlated to the metadata and other information
We used GIDL to support a set of bioinformatics applications developed in the Molecular Biodiversity Laboratory (MBLab) [15]

Summary

Introduction

In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. New challenges arise in biology and the need for transforming large volumes of raw data into usable knowledge about our world and its inhabitants emerges. This transformation poses significant challenges that necessitate the assistance of automated methods. Several informatics efforts have been made to integrate molecular data stored in the National Center for Biotechnology Information (NCBI) [5] with other molecular public databases, or with public biodiversity data resources, as in the case of Global Biodiversity Information Facility (GBIF) [6], where the primary biodiversity data are correlated to the metadata and other information

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 28, 2012
Citations: 25	License type: cc-by

R Discovery Prime

R Discovery Prime

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

MBLabDB: a social database for molecular biodiversity data
Flavio Licciulli ... Domenico Catalano
EMBnet.journal | VOL. 18
Flavio Licciulli, et. al.Flavio Licciulli ... Domenico Catalano
09 Nov 2012
EMBnet.journal | VOL. 18

A Chado case study: an ontology-based modular schema for representing genome-associated biological information
Christopher J Mungall ... David B Emmert
Bioinformatics | VOL. 23
Christopher J Mungall, et. al.Christopher J Mungall ... David B Emmert
01 Jul 2007
Bioinformatics | VOL. 23

Extracting Entity Relationship Diagram (ERD) From Relational Database Schem
Hala Khaled Al-Masree
International Journal of Database Theory and Application | VOL. 8
Hala Khaled Al-MasreeHala Khaled Al-Masree
30 Jun 2015
International Journal of Database Theory and Application | VOL. 8

Method of generating ORM models software code based on relational database schemes
Kyrylo Dolhopolov ... Zulfiia Imanhulova
Bulletin of Kharkov National Automobile and Highway University | VOL. -
Kyrylo Dolhopolov, et. al.Kyrylo Dolhopolov ... Zulfiia Imanhulova
07 Apr 2023
Bulletin of Kharkov National Automobile and Highway University | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics