Assessing graph-based read mappers against a baseline approach highlights strengths and weaknesses of current methods

Ivar Grytten,Alexander J Nederbragt,Geir K Sandve,Knut D Rand

doi:10.1186/s12864-020-6685-y

Abstract

BackgroundGraph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show that mapping to graph-based reference genomes can improve accuracy as compared to methods using linear references. Both of these methods index the sequences for most paths up to a certain length in the graph in order to enable direct mapping of reads containing common variants. However, the combinatorial explosion of possible paths through nearby variants also leads to a huge search space and an increased chance of false positive alignments to highly variable regions.ResultsWe here assess three prominent graph-based read mappers against a hybrid baseline approach that combines an initial path determination with a tuned linear read mapping method. We show, using a previously proposed benchmark, that this simple approach is able to improve overall accuracy of read-mapping to graph-based reference genomes.ConclusionsOur method is implemented in a tool Two-step Graph Mapper, which is available at https://github.com/uio-bmi/two_step_graph_mapperalong with data and scripts for reproducing the experiments. Our method highlights characteristics of the current generation of graph-based read mappers and shows potential for improvement for future graph-based read mappers.

Highlights

Graph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known
Mapping accuracies are compared using receiver operating characteristic (ROC) curves parameterized by the mapping quality (MAPQ) of all the simulated reads, where each dot in the plot shows the recall and error rate for reads with at least the corresponding MAPQ
We suggest that the path-prediction in itself can be achieved by initial rough graph-mapping, and as an example, we use an initial rough graph-mapping method where all the reads first are aligned to the linear reference genome and subsequently locally fitted to the graph

Summary

Introduction

Graph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show that mapping to graph-based reference genomes can improve accuracy as compared to methods using linear references Both of these methods index the sequences for most paths up to a certain length in the graph in order to enable direct mapping of reads containing common variants. Grytten et al BMC Genomics (2020) 21:282 more than a day for a human whole-genome graph – Seven Bridges uses a faster approach in which only short kmers (21 base pair sequences at 7 base pair intervals) are indexed This enables indexing of a human whole-genome graph in only minutes. As complex graphs containing many genetic variants can result in long indexing time as well as poor mapping accuracy [3], existing graph-based read mappers ignore the most complex regions in the graph when indexing the graph. Some have proposed to not use graphs, but instead improve the current linear reference genome [13]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Apr 6, 2020
Citations: 14	License type: open-access

R Discovery Prime

R Discovery Prime

Assessing graph-based read mappers against a baseline approach highlights strengths and weaknesses of current methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
Danang Crysnanto ... Hubert Pausch
Genome Biology | VOL. 21
Danang Crysnanto, et. al.Danang Crysnanto ... Hubert Pausch
27 Jul 2020
Genome Biology | VOL. 21

Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants
Iuliana Ionita-Laza ... Xihong Lin
American Journal of Human Genetics | VOL. 92
Iuliana Ionita-Laza, et. al.Iuliana Ionita-Laza ... Xihong Lin
16 May 2013
American Journal of Human Genetics | VOL. 92

CASSys: an integrated software-system for the interactive analysis of ChIP-seq data.
Malik Alawi ... Michael Beckstette
Journal of Integrative Bioinformatics | VOL. 8
Malik Alawi, et. al.Malik Alawi ... Michael Beckstette
21 Jun 2011
Journal of Integrative Bioinformatics | VOL. 8

CASSys: an integrated software-system for the interactive analysis of ChIP-seq data
Malik Alawi ... Michael Beckstette
Journal of Integrative Bioinformatics | VOL. 8
Malik Alawi, et. al.Malik Alawi ... Michael Beckstette
01 Jun 2011
Journal of Integrative Bioinformatics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing graph-based read mappers against a baseline approach highlights strengths and weaknesses of current methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics