Abstract

BackgroundFunded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition.ResultsIn recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy), on the public Amazon Cloud (http://aws.amazon.com), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies.ConclusionsUsing these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.

Highlights

  • Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans and D. melanogaster

  • C. elegans, the modENCODE data includes information collected from seven other drosophila and four other caenorhabditis species

  • To simplify the process of creating and configuring one or more Galaxy instances, we provide the Perl script modENCODE_galaxy_create.pl and its EC2 API command line tools dependencies, which are all available from our modENCODE Galaxy GitHub https://github. com/modENCODE-Data Coordinating Center (DCC)/Galaxy, in the “bin” directory

Read more

Summary

Results

The uniform peak calling workflows take a FASTA reference genome and either 2 or 3 pairs of FASTQ files representing the ChIP and control (input) experiments. To run the same workflow on your own data, use the Galaxy upload features to load your own set of ChIP/control FASTQ files. This will provide you with peaks that have been called and QC checked with methods identical to those used adopted by the modENCODE and ENCODE projects. In addition to using the public modENCODE GBrowse instance, advanced users can run a copy of the genome browser server within the Amazon cloud, thereby allowing them to host a complete copy of the modENCODE data set and to add their own private data sets to the corpus Instructions for doing this can be found at http://data. Instructions for doing this can be found at http://data. modencode.org/modencode-cloud.html

Background
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call