HIA: a genome mapper using hybrid index-based sequence alignment.

Jongpill Choi,Kiejung Park,Myungguen Chung,Seong Beom Cho

doi:10.1186/s13015-015-0062-4

Jongpill Choi, Kiejung Park + Show 2 more

Open Access

https://doi.org/10.1186/s13015-015-0062-4

Copy DOI

Abstract

BackgroundA number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produced exome or whole-genome sequences from several hundreds or thousands of samples. To accommodate the increasing need of analyzing very large NGS data sets, it is necessary to develop faster, more sensitive and accurate mapping tools.ResultsHIA uses two indices, a hash table index and a suffix array index. The hash table performs direct lookup of a q-gram, and the suffix array performs very fast lookup of variable-length strings by exploiting binary search. We observed that combining hash table and suffix array (hybrid index) is much faster than the suffix array method for finding a substring in the reference sequence. Here, we defined the matching region (MR) is a longest common substring between a reference and a read. And, we also defined the candidate alignment regions (CARs) as a list of MRs that is close to each other. The hybrid index is used to find candidate alignment regions (CARs) between a reference and a read. We found that aligning only the unmatched regions in the CAR is much faster than aligning the whole CAR. In benchmark analysis, HIA outperformed in mapping speed compared with the other aligners, without significant loss of mapping accuracy.ConclusionsOur experiments show that the hybrid of hash table and suffix array is useful in terms of speed for mapping NGS sequencing reads to the human reference genome sequence. In conclusion, our tool is appropriate for aligning massive data sets generated by NGS sequencing.Electronic supplementary materialThe online version of this article (doi:10.1186/s13015-015-0062-4) contains supplementary material, which is available to authorized users.

Highlights

A number of alignment tools have been developed to align sequencing reads to the human reference genome
Our experiment showed that the hash table index can decrease considerably the searching time
Evaluation data sets and evaluation measurements We made six datasets from the GRCH37 build of the human genome, using Mason [19]. Two of these are unpaired Illumina-like datasets, consisting respectively of one million 100 bp reads and one million 150 bp reads, which Mason simulated with parameters ‘illumina -hn 2 -sq -n 100 -N 1000000’ and ‘illumina -hn 2 -sq -n 150 -N 1000000’

Summary

Introduction

A number of alignment tools have been developed to align sequencing reads to the human reference genome. Recent studies based on NGS technology have routinely produced exome or whole-genome sequences from several hundreds or thousands of samples. Recent studies based on next-generation sequencing (NGS) technology have produced hundreds or thousands of exome or whole genome sequences with decreasing cost of NGS experiments [1]. To keep pace with developing NGS technologies, many alignment tools have been developed for both short and long reads. These tools include SSAHA2 [3], BWA [4, 5], AGILE [6], SOAP2 [7], Bowtie2 [8], SeqAlto [9] and others. Most BWT-based alignment tools use the full-text minute index [13], which is memory-efficient and similar to the suffix tree. With respect to matching time, the suffix tree is efficient for

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Dec 1, 2015
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

HIA: a genome mapper using hybrid index-based sequence alignment.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Whole Genome Resequencing and 1000 Genomes Project
Ku Chee‐Seng ... Pawitan Yudi
-
Ku Chee‐Seng, et. al.Ku Chee‐Seng ... Pawitan Yudi
19 Apr 2010
Whole Genome Resequencing and 1000 Genomes Project
Ku Chee‐Seng ... Pawitan Yudi

Next Generation Sequencing Technologies and Their Applications
Ku Chee‐Seng ... Loy En Yun
-
Ku Chee‐Seng, et. al.Ku Chee‐Seng ... Loy En Yun
19 Apr 2010
19 Apr 2010

Towards standardization of the description and publication of next‐generation sequencing datasets of fungal communities
R Henrik Nilsson ... Kessy Abarenkov
New Phytologist | VOL. 191
R Henrik Nilsson, et. al.R Henrik Nilsson ... Kessy Abarenkov
09 May 2011
New Phytologist | VOL. 191

Green Day: An Interview with NHGRI Director Eric Green
Eric D Green ... Kevin Davies
GEN Biotechnology | VOL. 2
Eric D Green, et. al.Eric D Green ... Kevin Davies
01 Apr 2023
GEN Biotechnology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HIA: a genome mapper using hybrid index-based sequence alignment.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology