Toward a statistically explicit understanding of de novo sequence assembly

Mark Howison,Casey W Dunn,Felipe Zapata

doi:10.1093/bioinformatics/btt525

Abstract

Draft de novo genome assemblies are now available for many organisms. These assemblies are point estimates of the true genome sequences. Each is a specific hypothesis, drawn from among many alternative hypotheses, of the sequence of a genome. Assembly uncertainty, the inability to distinguish between multiple alternative assembly hypotheses, can be due to real variation between copies of the genome in the sample, errors and ambiguities in the sequenced data and assumptions and heuristics of the assemblers. Most assemblers select a single assembly according to ad hoc criteria, and do not yet report and quantify the uncertainty of their outputs. Those assemblers that do report uncertainty take different approaches to describing multiple assembly hypotheses and the support for each. Here we review and examine the problem of representing and measuring uncertainty in assemblies. A promising recent development is the implementation of assemblers that are built according to explicit statistical models. Some new assembly methods, for example, estimate and maximize assembly likelihood. These advances, combined with technical advances in the representation of alternative assembly hypotheses, will lead to a more complete and biologically relevant understanding of assembly uncertainty. This will in turn facilitate the interpretation of downstream analyses and tests of specific biological hypotheses.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Toward a statistically explicit understanding of de novo sequence assembly

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: Sep 10, 2013
Citations: 27

Similar Papers

The Art of the Null Hypothesis—Considerations for Study Design and Scientific Reporting
Christian T O'Donnell ... Matthew W Vanneman
Journal of Cardiothoracic and Vascular Anesthesia | VOL. 37
Christian T O'Donnell, et. al.Christian T O'Donnell ... Matthew W Vanneman
22 Feb 2023
Journal of Cardiothoracic and Vascular Anesthesia | VOL. 37

Inference from a sample mean--Part 1.
Nikolaos Pandis
American journal of orthodontics and dentofacial orthopedics : official publication of the American Association of Orthodontists, its constituent societies, and the American Board of Orthodontics | VOL. 147
Nikolaos PandisNikolaos Pandis
01 Jun 2015
01 Jun 2015

Adaptive Radar Detection in the Presence of Multiple Alternative Hypotheses Using Kullback-Leibler Information Criterion-Part II: Applications
Pia Addabbo ... Gaetano Giunta
IEEE Transactions on Signal Processing | VOL. 69
Pia Addabbo, et. al.Pia Addabbo ... Gaetano Giunta
01 Jan 2020
IEEE Transactions on Signal Processing | VOL. 69

Outlier separability analysis with a multiple alternative hypotheses test
Ling Yang ... Yunzhong Shen
Journal of Geodesy | VOL. 87
Ling Yang, et. al.Ling Yang ... Yunzhong Shen
14 Mar 2013
Journal of Geodesy | VOL. 87

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward a statistically explicit understanding of de novo sequence assembly

Abstract

Talk to us

Similar Papers

More From: Bioinformatics