Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputeddata.

Jacob Schreiber,Jeffrey Bilmes,William Stafford Noble

doi:10.1093/bioinformatics/btaa830

Jacob Schreiber, Jeffrey Bilmes + Show 1 more

Open Access

https://doi.org/10.1093/bioinformatics/btaa830

Copy DOI

Journal: Bioinformatics (Oxford, England)	Publication Date: Sep 23, 2020
Citations: 2	License type: CC BY 4.0

Affiliation: University of Washington

Abstract

MotivationSuccessful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types (‘biosamples’) and a list of possible high-throughput sequencing assays, where at least one experiment has been performed in each biosample and for each assay, we ask ‘Which experiments should ENCODE perform next?’ResultsWe demonstrate how to represent this task as a submodular optimization problem, where the goal is to choose a panel of experiments that maximize the facility location function. A key aspect of our approach is that we use imputed data, rather than experimental data, to directly answer the posed question. We find that, across several evaluations, our method chooses a panel of experiments that span a diversity of biochemical activity. Finally, we propose two modifications of the facility location function, including a novel submodular–supermodular function, that allow incorporation of domain knowledge or constraints into the optimization procedure.Availability and implementationOur method is available as a Python package at https://github.com/jmschrei/kiwano and can be installed using the command pip install kiwano. The source code used here and the similarity matrix can be found at http://doi.org/10.5281/zenodo.3708538.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Experimental characterization of the genomic and epigenomic landscape of a human cell line or tissue (“biosample”) is expensive but can potentially yield valuable insights into the molecular basis for development and disease
Several approaches have been proposed to address this challenge. Some scientific consortia, such as GTEx and ENTEX, aim to completely fill in a submatrix of selected assays and selected biosamples. Other consortia, such as the Roadmap Epigenomics Mapping Consortium [1] and ENCODE [2], adopted a roughly “L”-shaped strategy, in which consortium members focused on carrying out many assays in a small set of high-priority biosamples, and some assays were carried out over a much larger set of biosamples
We first generated imputations of epigenomic and transcriptomic experiments using a recently developed imputation approached based on deep tensor factorization, named Avocado

Summary

Introduction

Experimental characterization of the genomic and epigenomic landscape of a human cell line or tissue (“biosample”) is expensive but can potentially yield valuable insights into the molecular basis for development and disease. We cannot afford to fill in an experimental data matrix in which rows correspond to types of assays and columns correspond to biosamples. Several approaches have been proposed to address this challenge. Some scientific consortia, such as GTEx and ENTEX, aim to completely fill in a submatrix of selected assays and selected biosamples. While the imputation strategy can relatively complete the entire matrix, a drawback is that the imputed data is potentially less trustworthy than actual experimental data

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputeddata.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Similar Papers

ChIP-seq for the Identification of Functional Elements in the Human Genome.
Georgi K Marinov
Methods in molecular biology (Clifton, N.J.) | VOL. 1543
Georgi K MarinovGeorgi K Marinov
01 Jan 2017
Methods in molecular biology (Clifton, N.J.) | VOL. 1543

Letter from the editor: adenosine‐to‐inosine RNA editing in Alu repeats in the human genome
Keren Levanon ... Eli Eisenberg
EMBO reports | VOL. 6
Keren Levanon, et. al.Keren Levanon ... Eli Eisenberg
01 Sep 2005
EMBO reports | VOL. 6

Disproportionate Contributions of Select Genomic Compartments and Cell Types to Genetic Risk for Coronary Artery Disease.
Hong-Hee Won ... Kasper Lage
PLOS Genetics | VOL. 11
Hong-Hee Won, et. al.Hong-Hee Won ... Kasper Lage
28 Oct 2015
PLOS Genetics | VOL. 11

A map of the cis-regulatory sequences in the mouse genome
Yin Shen ... Zhen Ye
Nature | VOL. 488
Yin Shen, et. al.Yin Shen ... Zhen Ye
01 Jul 2012
Nature | VOL. 488

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputeddata.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)