Vi-HMM: a novel HMM-based method for sequence variant identification in short-read data

Man Tang,Mohammad Shabbir Hasan,Xiaowei Wu,Hongxiao Zhu,Liqing Zhang

doi:10.1186/s40246-019-0194-6

Man Tang, Mohammad Shabbir Hasan + Show 3 more

Open Access

PDF Available

https://doi.org/10.1186/s40246-019-0194-6

Copy DOI

Export

Save

Cite

Journal: Human Genomics	Publication Date: Feb 13, 2019
Citations: 2	License type: open-access

Affiliation: Virginia Tech

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundAccurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD).Results and conclusionWe propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as “SNP,” “Ins,” “Del,” and “Match”) of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F1 score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs.

Highlights

Rapid evolution of next-generation sequencing (NGS) technologies in recent years enables various genetic applications in a fast, efficient, and cost-effective way [1, 2]
The vi-hidden Markov model (HMM) method performs variant calling for single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs) after short reads are mapped to a reference genome
Performance evaluation based on data simulated by wgsim In general, vi-HMM performs well in calling SNPs and INDELs on the data simulated by wgsim

Summary

Introduction

Rapid evolution of next-generation sequencing (NGS) technologies in recent years enables various genetic applications in a fast, efficient, and cost-effective way [1, 2]. Accurate and reliable identification of single nucleotide polymorphisms (SNPs) and insertiondeletion polymorphisms (INDELs) plays an important role in all NGS applications as these common sequence variants are highly abundant in the human genome and have been found to likely influence human traits and disease [3,4,5]. After reads are correctly mapped, statistical models or heuristics may be used to predict the likelihood of variation at each locus (2019) 13:9 make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD)

Methods

Results

Discussion

Conclusion