Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs

Izabella Krystkowiak,Piotr Kuterba,Michal Petas,Bozena Kaminska,Jakub Lenart,Michal Dabrowski,Konrad Debski

doi:10.1093/database/bat069

Izabella Krystkowiak, Piotr Kuterba + Show 5 more

Open Access

https://doi.org/10.1093/database/bat069

Copy DOI

Abstract

We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.Database URL: http://www.nencki-genomics.org.

Highlights

Analysis of gene co-regulation requires programmatic access to large amounts of regulatory genomics data, such as the coordinates of genes, chromatin modifications, transcription factor (TF) binding sites and/or motifs
We developed a database system, named the Nencki Genomics Database (NGD), which for the three species currently represented in Ensembl funcgen extends the data and functionality of Ensembl funcgen
The processing of the ENCODE data is described by the Ensembl team under this link http://ftp.ebi.ac.uk/pub/data bases/ensembl/encode/integration_data_jan2011/hg19/unif ormTfbs.html

Summary

Introduction

Analysis of gene co-regulation requires programmatic access to large amounts of regulatory genomics data, such as the coordinates of genes, chromatin modifications, transcription factor (TF) binding sites and/or motifs. In the Ensembl database, it is not possible for the user to upload, manage and share own private data or to compute genome-wide intersections between genomic features (overlap on the genomic sequence). To the Ensembl-derived data, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrences (instances) of transcription factor binding site (TFBS) motifs, from the current versions of two major motif libraries: public Jaspar [8, 9] and (for the most recent NGD version 71_1) commercial Transfac Professional (Biobase). In addition to SQL queries, NGD provides procedures for (i) genomic data analysis—area–gene mapping, area–area intersections and area–motif intersections and (ii) data management— addition/removal, managing access rights and making the data public (Figure 1). To Ensembl, the schema for each NGD version and species contains identical tables/views (Figure 2). The temporary tables are session-separated and private to each user

Procedures

Methods

Findings

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: Jan 1, 2013
Citations: 20	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the Drosophila genome: the fluffy-tail test
Irina Abnizova ... Walter R Gilks
BMC Bioinformatics | VOL. 6
Irina Abnizova, et. al.Irina Abnizova ... Walter R Gilks
01 Jan 2004
BMC Bioinformatics | VOL. 6

Stochastic EM-based TFBS motif discovery with MITSU
Alastair M Kilpatrick ... Stuart Aitken
Bioinformatics | VOL. 30
Alastair M Kilpatrick, et. al.Alastair M Kilpatrick ... Stuart Aitken
11 Jun 2014
Bioinformatics | VOL. 30

MCOIN: a novel heuristic for determining TFBS motif width
...
-
, et. al. ...
18 Jun 2013
18 Jun 2013

Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data
Bertrand R Huber ... Martha L Bulyk
BMC Bioinformatics | VOL. 7
Bertrand R Huber, et. al.Bertrand R Huber ... Martha L Bulyk
27 Apr 2006
BMC Bioinformatics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Database