Abstract

Structural variations (SVs) play a crucial role in genetic diversity. However, the alignments of reads near/across SVs are made inaccurate by the presence of polymorphisms. BatAlign is an algorithm that integrated two strategies called ‘Reverse-Alignment’ and ‘Deep-Scan’ to improve the accuracy of read-alignment. In our experiments, BatAlign was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrant, concordantly/discordantly paired and SV-spanning data sets. On real data, the alignments of BatAlign were able to recover 4.3% more PCR-validated SVs with 73.3% less callings. These suggest BatAlign to be effective in detecting SVs and other polymorphic-variants accurately using high-throughput data. BatAlign is publicly available at https://goo.gl/a6phxB.

Highlights

  • Aligning sequencing reads to a reference genome is usually the first step in most of the genomic analysis

  • Mapping biases, which occur in genomic regions with strong homology to other genomic locations [28], contribute to erroneous callings of single nucleotide variants (SNVs), indels and structural variations (SVs)

  • We presented a method BatAlign, for the gapped alignment of short reads onto a reference genome with improved accuracy and sensitivity

Read more

Summary

Introduction

Aligning sequencing reads to a reference genome is usually the first step in most of the genomic analysis. The sensitivity and accuracy of calling structural variations (SVs) can be affected. This motivated us to study the alignment of short reads that are associated with SV and with single nucleotide variants (SNVs) and insert– delete (indel) variants. A number of such methods have been proposed, including SOAP [1], RMAP [2], Bowtie [3], PerM [4] and BatMis [5] They are generally fast, they will miss capturing the wide spectrum of non-SNVs that have been shown to represent 7–8% of human polymorphisms [6]. As increasing evidences show that indels are involved in a wide range of diseases [7], mismatch aligners are unsuitable to be used in the studies of such biologically important events

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call