Methods for constructing and evaluating consensus genomic interval sets.

Julia Rymuza,Yuchen Sun,Guangtao Zheng,Nathan J Leroy,Maria Murach,Neil Phan,Aidong Zhang,Nathan C Sheffield

doi:10.1093/nar/gkae685

Abstract

The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitions. We require methods to assess this loss of precision and build optimal consensus region sets. Here, we introduce the concept of flexible intervals and propose three novel methods for building consensus region sets, or universes: a coverage cutoff method, a likelihood method, and a Hidden Markov Model. We then propose three novel measures for evaluating how well a proposed universe fits a collection of region sets: a base-level overlap score, a region boundary distance score, and a likelihood score. We apply our methods and evaluation approaches to several collections of region sets and show how these methods can be used to evaluate fit of universes and build optimal universes. We describe scenarios where the common approach of merging regions to create consensus leads to undesirable outcomes and provide principled alternatives that provide interoperability of interval data while minimizing loss of resolution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Methods for constructing and evaluating consensus genomic interval sets.

Abstract

Talk to us

Similar Papers

More From: Nucleic acids research

Lead the way for us

Journal: Nucleic acids research	Publication Date: Aug 24, 2024
License type: CC BY 4.0

Similar Papers

Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying
Marco Masseroli ... Stefano Ceri
Methods | VOL. 111
Marco Masseroli, et. al.Marco Masseroli ... Stefano Ceri
13 Sep 2016
Methods | VOL. 111

Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling
Torben Friedrich ... Christian Koetschan
Statistical Applications in Genetics and Molecular Biology | VOL. 9
Torben Friedrich, et. al.Torben Friedrich ... Christian Koetschan
06 Jan 2010
Statistical Applications in Genetics and Molecular Biology | VOL. 9

Likelihood analysis of earthquake focal mechanism distributions
Yan Y Kagan ... David D Jackson
Geophysical Journal International | VOL. 201
Yan Y Kagan, et. al.Yan Y Kagan ... David D Jackson
04 Apr 2015
Geophysical Journal International | VOL. 201

Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
James R Wagner ... Wyeth W Wasserman
PLoS Computational Biology | VOL. 6
James R Wagner, et. al.James R Wagner ... Wyeth W Wasserman
08 Jul 2010
PLoS Computational Biology | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Methods for constructing and evaluating consensus genomic interval sets.

Abstract

Talk to us

Similar Papers

More From: Nucleic acids research