Abstract

CODEX (http://codex.stemcells.cam.ac.uk/) is a user-friendly database for the direct access and interrogation of publicly available next-generation sequencing (NGS) data, specifically aimed at experimental biologists. In an era of multi-centre genomic dataset generation, CODEX provides a single database where these samples are collected, uniformly processed and vetted. The main drive of CODEX is to provide the wider scientific community with instant access to high-quality NGS data, which, irrespective of the publishing laboratory, is directly comparable. CODEX allows users to immediately visualize or download processed datasets, or compare user-generated data against the database's cumulative knowledge-base. CODEX contains four types of NGS experiments: transcription factor chromatin immunoprecipitation coupled to high-throughput sequencing (ChIP-Seq), histone modification ChIP-Seq, DNase-Seq and RNA-Seq. These are largely encompassed within two specialized repositories, HAEMCODE and ESCODE, which are focused on haematopoiesis and embryonic stem cell samples, respectively. To date, CODEX contains over 1000 samples, including 221 unique TFs and 93 unique cell types. CODEX therefore provides one of the most complete resources of publicly available NGS data for the direct interrogation of transcriptional programmes that regulate cellular identity and fate in the context of mammalian development, homeostasis and disease.

Highlights

  • One of the fundamental questions in biology is how a single fertilized egg cell faithfully develops into a multicellular organism containing specialized organs capable of homeostasis and regeneration, while the genomic content within each cell remains essentially unchanged

  • In an effort to bridge this gap between the vast amounts of publicly available next-generation sequencing (NGS) raw data and end-user friendly information, we have developed CODEX, a database of NGS experiments including ChIP-Seq, RNA-Seq and DNase-Seq

  • To ensure CODEX remains up-to-date with newly published datasets we have developed a web crawler that searches the Gene Expression Omnibus (GEO) database and reports any new samples of relevance to CODEX

Read more

Summary

Introduction

One of the fundamental questions in biology is how a single fertilized egg cell faithfully develops into a multicellular organism containing specialized organs capable of homeostasis and regeneration, while the genomic content within each cell remains essentially unchanged. Cell-type specific transcriptional and chromatin landscapes are critical determinants of the global gene expression patterns that define cell identities and fate choices [1]. As key regulators of these processes, transcription factors (TFs) are thought to act combinatorially to confer context-specific activities responsible for orchestrating global gene expression patterns that drive stem cell self-renewal, proliferation, homeostasis, cell differentiation and specification [2]. Two of the most studied systems of mammalian development are the haematopoietic system and embryonic stem (ES) cells [3,4,5]. The haematopoietic system is of particular interest in the context of disease, where transcriptional dysregulation is known to drive numerous haematological malignancies [6,7]

Objectives
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call