Damidseq_pipeline: an automated pipeline for processing DamID sequencing datasets.

Owen J Marshall,Andrea H Brand

doi:10.1093/bioinformatics/btv386

Abstract

Summary: DamID is a powerful technique for identifying regions of the genome bound by a DNA-binding (or DNA-associated) protein. Currently, no method exists for automatically processing next-generation sequencing DamID (DamID-seq) data, and the use of DamID-seq datasets with normalization based on read-counts alone can lead to high background and the loss of bound signal. DamID-seq thus presents novel challenges in terms of normalization and background minimization. We describe here damidseq_pipeline, a software pipeline that performs automatic normalization and background reduction on multiple DamID-seq FASTQ datasets.Availability and implementation: Open-source and freely available from http://owenjm.github.io/damidseq_pipeline. The damidseq_pipeline is implemented in Perl and is compatible with any Unix-based operating system (e.g. Linux, Mac OSX).Contact: o.marshall@gurdon.cam.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

DamID is a well-established technique for discovering regions of DNA bound by or associated with proteins
DamID involves the fusion of a bacterial DNA adenine methylase (Dam) to any DNA-associated protein of interest
The bacterial Dam protein methylates adenine in the sequence GATC and, given that higher eukaryotes lack native adenine methylation, the DNAbinding footprint of the protein of interest is uniquely detectable through isolating sequences flanked by methylated GATC sites

Summary

Introduction

DamID is a well-established technique for discovering regions of DNA bound by or associated with proteins (van Steensel and Henikoff, 2000). The technique can be performed in cell culture, whole organisms (van Steensel and Henikoff, 2000) or with cell-type specificity (Southall et al, 2013), and requires no fixation or antibody purification. A major consideration with DamID is that any Dam protein within the nucleus will non- methylate adenines in GATC sequences at accessible regions of the genome. For this reason, DamID is always performed concurrently with a Dam-only control, and the final DNA-binding profile is typically presented as a log2(Dam-fusion/Dam-only) ratio. We describe a software pipeline for the automated processing of DamID-sequencing (DamID-seq) data, including normalization and background reduction algorithms

Algorithms

Findings

Implementation