Abstract
This paper presents a data Grid system, built on top of specific biological data sources in flat file format, which carries out the ingestion into a relational DBMS that integrates these data. The prototype has been implemented for UniProtKB (located at EBI - European Bioinformatics Institute, UK) and UTRdb (located at ITB/CNR Bari, Italy) data banks owing to the following two reasons: a public available relational schema of the UniProtKB and UTRdb does not exist; UniProtKB is the most complete repository of proteins whereas UTRdb contains mRNA nucleotides and although the relation between nucleotides and proteins could be important for several studies, an explicit management of such relationship (cross-referenced link) is not yet available. The system also allows transparent, periodic update of both the DBMS and the involved data banks. Each component is a GSI (Grid Security Infrastructure) enabled Web Service, exploiting the gSOAP Toolkit; the system utilizes several grid nodes to carry out the data ingestion faster whilst reducing the redundance of data present into the flat files.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have