GMASS: a novel measure for genome assembly structural similarity

Daehong Kwon,Jongin Lee,Jaebum Kim

doi:10.1186/s12859-019-2710-z

Abstract

BackgroundThanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Diverse assemblers have been developed to generate high quality de novo assemblies using the NGS reads, but their output is very different because of algorithmic differences. However, there are not properly structured measures to show the similarity or difference in assemblies.ResultsWe developed a new measure, called the GMASS score, for comparing two genome assemblies in terms of their structure. The GMASS score was developed based on the distribution pattern of the number and coverage of similar regions between a pair of assemblies. The new measure was able to show structural similarity between assemblies when evaluated by simulated assembly datasets. The application of the GMASS score to compare assemblies in recently published benchmark datasets showed the divergent performance of current assemblers as well as its ability to compare assemblies.ConclusionThe GMASS score is a novel measure for representing structural similarity between two assemblies. It will contribute to the understanding of assembly output and developing de novo assemblers.

Highlights

Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating
Starting from an ancestral form of an assembly, descendent assemblies in the dataset were simulated with different evolutionary divergence which determined the amount of perturbation in the assembly simulation process
The GMASS score is a novel measure for representing structural similarity between two assemblies

Summary

Introduction

Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Several algorithms have been developed to generate high-quality de novo assemblies They are mainly classified into three categories: the greedy graph based algorithm, the Overlap-Layout-Consensus based algorithm, and the de Bruijn graph based algorithm [4]. The overlap scores are calculated using the number of matching bases in the overlap Both the Overlap-Layout-Consensus based algorithm and de Brujin graph-based algorithm rely on a graph structure constructed from the NGS reads. The Overlap-Layout-Consensus based algorithm, such as CABOG [8], Newbler [9] and Celera assemblers [10], constructs an overlap graph using the direct overlap among the NGS reads, whereas the de Bruijn graph based algorithm, such as ABySS [11], SOAPdenovo [12], ALLPATHS-LG [13] and Velvet [14], Kwon et al BMC Bioinformatics (2019) 20:147 is based on the overlap of all possible subsequences of length k, known as k-mer, extracted from the NGS reads

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 18, 2019
Citations: 6	License type: open-access

R Discovery Prime

R Discovery Prime

GMASS: a novel measure for genome assembly structural similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction
Peng Zeng ... Tinggan Zhou
Chinese Medicine | VOL. 17
Peng Zeng, et. al.Peng Zeng ... Tinggan Zhou
09 Aug 2022
Chinese Medicine | VOL. 17

CUSHAW Suite: Parallel and Efficient Algorithms for NGS Read Alignment
Yongchao Liu ... Bertil Schmidt
-
Yongchao Liu, et. al.Yongchao Liu ... Bertil Schmidt
01 Jan 2017
01 Jan 2017

Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.
Jie Ren ... Kai Song
Bioinformatics | VOL. 32
Jie Ren, et. al.Jie Ren ... Kai Song
30 Jun 2015
Bioinformatics | VOL. 32

Optimizing hybrid assembly of next-generation sequence data from Enterococcus faecium: a microbe with highly divergent genome
Yajun Wang ... Xiaogang Xu
BMC Systems Biology | VOL. 6
Yajun Wang, et. al.Yajun Wang ... Xiaogang Xu
01 Dec 2012
BMC Systems Biology | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GMASS: a novel measure for genome assembly structural similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics