Abstract

A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org.

Highlights

  • Functional genomic assays produced new data types

  • The data are typically stored as genome-wide intensities (e.g. bigWig files [1]) or functional regions (peak files [2])

  • Between the input data and every target dataset, the search is based on a maximization process, which maximizes a local similarity score (Rt) over the start location (i1) and the end location (i2) of a genomic region, namely: arg max Rt(i1, i2), t,i 1,i 2 where t is the index of the target datasets; i1 and i2 collectively represent a genomic region, that includes a chromosome number and the start and the end positions on this chromosome

Read more

Summary

Introduction

Functional genomic assays produced new data types. Leveraging DNA sequencing as a high-throughput readout, these assays can interrogate genome-wide distributions of transcription factor binding (ChIP-seq), epigenetic modifications (ChIP-seq), regulatory regions (DNase-seq, FAIREseq) and other functional outcomes. The data are typically stored as genome-wide intensities (e.g. bigWig files [1]) or functional regions (peak files [2]). These processed data provide functional information of the genome. The formats of these processed data are very different from those storing DNA sequences [2,3]. Functional genomic data bring new computational challenges

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call