Reference flow: reducing reference bias using multiple population genomes

Nae-Chyun Chen,Sheila Iyer,Ben Langmead,Taher Mun,Brad Solomon

doi:10.1186/s13059-020-02229-3

Abstract

Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.

Highlights

Sequencing data analysis often begins with aligning reads to a reference genome, with the reference represented as a linear string of bases
Simulations for major-allele reference flow We studied the efficacy of a strategy we call “MajorFlow,” which starts by aligning all reads to the global major reference
We first showed that a 2-pass method using superpopulation major-allele references (MajorFlow) outperformed both a standard linear reference and individual major-allele references

Summary

Introduction

Sequencing data analysis often begins with aligning reads to a reference genome, with the reference represented as a linear string of bases. Linearity leads to reference bias: a tendency to miss alignments or report incorrect alignments for reads containing non-reference alleles This can lead to confounding of scientific results, especially for analyses concerned with hypervariable regions [2], allele-specific effects [3,4,5,6], ancient DNA analysis [7, 8], or epigenenomic signals [9]. Some studies suggest replacing the typical linear reference with a “major-allele” version, with each variant set to its most common allele This can increase alignment [16,17,18] and genotyping accuracy [19]. The majorallele reference is largely compatible with the standard reference (though indels can shift coordinates) and imposes little or no additional computational overhead

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology	Publication Date: Jan 4, 2021
Citations: 63	License type: open-access

R Discovery Prime

R Discovery Prime

Reference flow: reducing reference bias using multiple population genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Genomic variations and epigenomic landscape of the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel
... Cathrin Herder
Genome biology | VOL. 23
, et. al. ... Cathrin Herder
21 Feb 2022
Genome biology | VOL. 23

Multi-CSAR: a web server for scaffolding contigs using multiple reference genomes.
Shu-Cheng Liu ... Yan-Ru Ju
Nucleic acids research | VOL. 50
Shu-Cheng Liu, et. al.Shu-Cheng Liu ... Yan-Ru Ju
07 May 2022
Nucleic acids research | VOL. 50

Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph
Rui Martiniano ... Richard Durbin
Genome Biology | VOL. 21
Rui Martiniano, et. al.Rui Martiniano ... Richard Durbin
17 Sep 2020
Genome Biology | VOL. 21

A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes
Fahad Saeed ... Ashfaq Khokhar
Journal of Parallel and Distributed Computing | VOL. 72
Fahad Saeed, et. al.Fahad Saeed ... Ashfaq Khokhar
16 Sep 2011
Journal of Parallel and Distributed Computing | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reference flow: reducing reference bias using multiple population genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology