Abstract

Some of the variants detected by high-throughput sequencing (HTS) are often not reproducible. To minimize the technical-induced artifacts, secondary experimental validation is required but this step is unnecessarily slow and expensive. Thus, developing a rapid and easy to use visualization tool is necessary to systematically review the statuses of sequence read alignments. Here, we developed a high-performance alignment capturing tool, CaReAl, for visualizing the read-alignment status of nucleotide sequences and associated genome features. CaReAl is optimized for the systematic exploration of regions of interest by visualizing full-depth read-alignment statuses in a set of PNG files. CaReAl was 7.5 times faster than IGV ‘snapshot’, the only stand-alone tool which provides an automated snapshot of sequence reads. This rapid user-programmable capturing tool is useful for obtaining read-level data for evaluating variant calls and detecting technical biases. The multithreading and sequential wide-genome-range-capturing functionalities of CaReAl aid the efficient manual review and evaluation of genome sequence alignments and variant calls. CaReAl is a rapid and convenient tool for capturing aligned reads in BAM. CaReAl facilitates the acquisition of highly curated data for obtaining reliable analytic results.

Highlights

  • The recent rapid evolution of high-throughput sequencing technology has resulted in the generation of huge volumes of data [1, 2]

  • From the nucleotide sequences revealed by this technology, the order of approximately 3 billion base pairs, variant calls are obtained via the variant calling process by which we identify genomic variants from sequence data [3]

  • An identifier that includes the Binary-sequence Alignment Format (BAM) file name, the chromosomal position of interest, the gene symbol, and the depth of coverage is displayed at the top of the figure

Read more

Summary

Introduction

The recent rapid evolution of high-throughput sequencing technology has resulted in the generation of huge volumes of data [1, 2]. From the nucleotide sequences revealed by this technology, the order of approximately 3 billion base pairs, variant calls are obtained via the variant calling process by which we identify genomic variants from sequence data [3]. A ‘read’ means a sequence of base pairs, and we perform genome assembly by taking these small fragments of reads and merging them into a longer DNA sequence. A genomic call is the conclusion of a nucleotide difference from a reference sequence at a given position, typically categorized as substitutions, insertions, and deletions (indels), etc. To obtain high-accuracy genome calls, various kinds of alignment and variant calling methods have been developed.

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call