Abstract

BackgroundBioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users.ResultsIn this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures.ConclusionsThe availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/

Highlights

  • Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers

  • The BioAssist program [3] of the Netherlands Bioinformatics Centre (NBIC) [4] is set-ting up a bioinformatics platform for generation DNA sequencing based on the Dutch Life Science Grid (LSGRID) [5], which is part of the Dutch Grid [6]

  • In this paper we describe our initial steps to adopt grid and workflow technology for the analysis of data produced by high throughput DNA sequencers, evaluating if and how these technologies can be applied on a routine basis to improve our analysis capacity and facilitate collaboration with other users of the Dutch LSGrid

Read more

Summary

Introduction

Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. The BioAssist program [3] of the Netherlands Bioinformatics Centre (NBIC) [4] is set-ting up a bioinformatics platform for generation DNA sequencing based on the Dutch Life Science Grid (LSGRID) [5], which is part of the Dutch Grid [6]. This infrastructure consists of computing and storage resources distributed among high performance computing organizations and research institutes, some of which take part in the European Grid Initiative (EGI) [7]. For all these reasons we were stimulated to explore the possibilities of grids to support DNA sequencing

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call