Abstract

GTRD—Gene Transcription Regulation Database (http://gtrd.biouml.org)—is a database of transcription factor binding sites (TFBSs) identified by ChIP-seq experiments for human and mouse. Raw ChIP-seq data were obtained from ENCODE and SRA and uniformly processed: (i) reads were aligned using Bowtie2; (ii) ChIP-seq peaks were called using peak callers MACS, SISSRs, GEM and PICS; (iii) peaks for the same factor and peak callers, but different experiment conditions (cell line, treatment, etc.), were merged into clusters; (iv) such clusters for different peak callers were merged into metaclusters that were considered as non-redundant sets of TFBSs. In addition to information on location in genome, the sets contain structured information about cell lines and experimental conditions extracted from descriptions of corresponding ChIP-seq experiments. A web interface to access GTRD was developed using the BioUML platform. It provides: (i) browsing and displaying information; (ii) advanced search possibilities, e.g. search of TFBSs near the specified gene or search of all genes potentially regulated by a specified transcription factor; (iii) integrated genome browser that provides visualization of the GTRD data: read alignments, peaks, clusters, metaclusters and information about gene structures from the Ensembl database and binding sites predicted using position weight matrices from the HOCOMOCO database.

Highlights

  • Recognition of transcription factor (TF) binding sites (TFBSs) in genomes has been one of the most important tasks of modern biology since the introduction of the DNA footprint technique in 1978 [1]

  • The appearance of ChIP-seq technology developed independently by three research groups in 2007 [2,3,4] allowed this hurdle to be overcome. This achievement resulted in an explosion in the number of freely available ChIP-seq datasets performed for different species, tissues and cell lines several years later

  • The well-known research project ENCODE selected ChIP-seq as one of the main assays to identify functional genomic elements starting from the phase II period [5]

Read more

Summary

INTRODUCTION

Recognition of transcription factor (TF) binding sites (TFBSs) in genomes has been one of the most important tasks of modern biology since the introduction of the DNA footprint technique in 1978 [1]. This aspect is quite important due to the differing quality of raw data obtained from various sources, conditions of experiments, abilities of applied algorithms, etc None of these reported databases integrates data from different ChIP-seq experiments to provide non-redundant sets of TFBSs. Taking into account the shortcomings mentioned above and having a novel view of how such data and data processing should be organized, we have established a Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org). Non-redundant sets of TFBSs produced by a new metacluster approach based on the merging of different ChIPseq experiments and results of different peak callers

MATERIALS AND METHODS
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call