Genome assembly quality: Assessment and improvement using the neutral indel model

Stephen Meader,Chris P Ponting,Ladeana W Hillier,Devin Locke,Gerton Lunter

doi:10.1101/gr.096966.109

Abstract

We describe a statistical and comparative-genomic approach for quantifying error rates of genome sequence assemblies. The method exploits not substitutions but the pattern of insertions and deletions (indels) in genome-scale alignments for closely related species. Using two- or three-way alignments, the approach estimates the amount of aligned sequence containing clusters of nucleotides that were wrongly inserted or deleted during sequencing or assembly. Thus, the method is well-suited to assessing fine-scale sequence quality within single assemblies, between different assemblies of a single set of reads, and between genome assemblies for different species. When applying this approach to four primate genome assemblies, we found that average gap error rates per base varied considerably, by up to sixfold. As expected, bacterial artificial chromosome (BAC) sequences contained lower, but still substantial, predicted numbers of errors, arguing for caution in regarding BACs as the epitome of genome fidelity. We then mapped short reads, at approximately 10-fold statistical coverage, from a Bornean orangutan onto the Sumatran orangutan genome assembly originally constructed from capillary reads. This resulted in a reduced gap error rate and a separation of error-prone from high-fidelity sequence. Over 5000 predicted indel errors in protein-coding sequence were corrected in a hybrid assembly. Our approach contributes a new fine-scale quality metric for assemblies that should facilitate development of improved genome sequencing and assembly strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Research	Publication Date: Mar 19, 2010
Citations: 64	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Genome assembly quality: Assessment and improvement using the neutral indel model

Abstract

Talk to us

Similar Papers

More From: Genome Research

Lead the way for us

Similar Papers

Genome Sequencing and Assembly Strategies and a Comparative Analysis of the Genomic Characteristics in Penaeid Shrimp Species.
Jianbo Yuan ... Xiaojun Zhang
Frontiers in genetics | VOL. 12
Jianbo Yuan, et. al.Jianbo Yuan ... Xiaojun Zhang
03 May 2021
Frontiers in genetics | VOL. 12

Benchmarking of next and third generation sequencing technologies and their associated algorithms for denovo genome assembly.
Marios Gavrielatos ... Konstantinos Kyriakidis
Molecular medicine reports | VOL. 23
Marios Gavrielatos, et. al.Marios Gavrielatos ... Konstantinos Kyriakidis
02 Feb 2021
Molecular medicine reports | VOL. 23

A multiway analysis for identifying high integrity bovine BACs
Abhirami Ratnakumar ... Brian P Dalrymple
BMC Genomics | VOL. 10
Abhirami Ratnakumar, et. al.Abhirami Ratnakumar ... Brian P Dalrymple
23 Jan 2009
BMC Genomics | VOL. 10

Comparative genomics of the fungal genus Verticillium
Xiaoqian Shi
-
Xiaoqian ShiXiaoqian Shi
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genome assembly quality: Assessment and improvement using the neutral indel model

Abstract

Talk to us

Similar Papers

More From: Genome Research