Abstract

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.Database URL: https://www.encodeproject.org/

Highlights

  • The Encyclopedia of DNA Elements (ENCODE) project is an international consortium with a goal of annotating regions of the genome

  • During the 6 years of the pilot and initial scale-up phase, the project surveyed the landscape of the H. sapiens and M. musculus genomes using over 20 high-throughput genomic assays in more than 350 different cell and tissue types, resulting in over 3000 datasets [3,4,5,6]

  • Using the expected relationship as a guide, we investigate the branches starting at either the Uber anatomy ontology (Uberon) term or the selected ontology term, from Cell Ontology (CL) or Experimental Factor Ontology (EFO) and proceed across them to determine the missing relationship that would integrate the branches across ontology boundaries

Read more

Summary

Introduction

The Encyclopedia of DNA Elements (ENCODE) project (https://www.encodeproject.org/) is an international consortium with a goal of annotating regions of the genome. The ENCODE DCC organizes metadata related to the experimental process into several major categories that include donors, biosamples, treatments, constructs, libraries, antibodies, replicates and data files We currently annotate three of these categories, a small subset of the metadata collected for an assay, using ontologies that provide the most additional information to the larger community (Figure 1).

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call