SNVHMM: predicting single nucleotide variants from next generation sequencing

Jiawen Bian,Xiaobo Zhou,Priyanka Kachroo,Jing Xing,Hongyan Wang,Chenglin Liu

doi:10.1186/1471-2105-14-225

Abstract

BackgroundThe rapid development of next generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Single nucleotide variants (SNVs) inferred from next generation sequencing are expected to reveal gene mutations in cancer. However, NGS has lower sequence coverage and poor SNVs detection capability in the regulatory regions of the genome. Post probabilistic based methods are efficient for detection of SNVs in high coverage regions or sequencing data with high depth. However, for data with low sequencing depth, the efficiency of such algorithms remains poor and needs to be improved.ResultsA new tool SNVHMM basing on a discrete hidden Markov model (HMM) was developed to infer the genotype for each position on the genome. We incorporated the mapping quality of each read and the corresponding base quality on the reads into the emission probability of HMM. The context information of the whole observation as well as its confidence were completely utilized to infer the genotype for each position on the genome in study. Therefore, more probability power can be gained over the Bayes based methods, which is very useful for SNVs detection for data with low sequencing depth. Moreover, our model was verified by testing against two sets of lobular breast tumor and Myelodysplastic Syndromes (MDS) data each. Comparing against a recently published SNVs calling algorithm SNVMix2, our model improved the performance of SNVMix2 largely when the sequencing depth is low and also outperformed SNVMix2 when SNVMix2 is well trained by large datasets.ConclusionsSNVHMM can detect SNVs from NGS cancer data efficiently even if the sequence depth is very low. The training data size can be very small for SNVHMM to work. SNVHMM incorporated the base quality and mapping quality of all observed bases and reads, and also provides the option for users to choose the confidence of the observation for SNVs prediction.

Highlights

The rapid development of generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research
The first type is the lobular breast tumor data with two different sequencing depths, which includes 497 positions generated using the Illumina GA II platform and was validated by Sanger. These positions were sequenced using Sanger capillary-based technology and were predicted to be non-synonymous proteincoding. 305 of these positions were confirmed as Single nucleotide variants (SNVs) and are taken as positive (TP), while 192 were not confirmed and are taken as true negative (TN)
The depths of supplementary dataset 2A and 2C are 10X and 40X respectively. We use these datasets to compare between SNVHMM and SNVMix2, which is more efficient than SNVMix1 [9]

Summary

Introduction

The rapid development of generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Single nucleotide variants (SNVs) inferred from generation sequencing are expected to reveal gene mutations in cancer. NGS has lower sequence coverage and poor SNVs detection capability in the regulatory regions of the genome. Post probabilistic based methods are efficient for detection of SNVs in high coverage regions or sequencing data with high depth. For data with low sequencing depth, the efficiency of such algorithms remains poor and needs to be improved. NGS can generate millions of reads ranging from 30–350 base pairs (bp) based on the sequencing platform used. Many novel inferences can be made including regulatory element identification, mutation detection, gene expression estimation and detection of RNA splicing and fusion transcripts. For the threshold based prediction methods, a good threshold setting is difficult to obtain and relies heavily on the user experience [8]

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 15, 2013
Citations: 22	License type: cc-by

R Discovery Prime

R Discovery Prime

SNVHMM: predicting single nucleotide variants from next generation sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Hidden Markov Models in Bioinformatics: SNV Inference from Next Generation Sequence.
Jiawen Bian ... Xiaobo Zhou
Methods in molecular biology (Clifton, N.J.) | VOL. 1552
Jiawen Bian, et. al.Jiawen Bian ... Xiaobo Zhou
01 Jan 2017
Methods in molecular biology (Clifton, N.J.) | VOL. 1552

Chapter 8 - Single Nucleotide Variant Detection Using Next Generation Sequencing
David H Spencer ... John Pfeifer
Clinical Genomics | VOL. -
David H Spencer, et. al.David H Spencer ... John Pfeifer
14 Nov 2014
Clinical Genomics | VOL. -

Abstract 2846: Molecular platforms for mutation analysis from whole blood derived clinical samples by nextgen sequencing
William M Strauss ... Erich Klem
Cancer Research | VOL. 74
William M Strauss, et. al.William M Strauss ... Erich Klem
30 Sep 2014
Cancer Research | VOL. 74

Reliability of Cell-Free DNA (cfDNA) Next Generation Sequencing in Predicting Chromosomal Structural Abnormalities and Cytogenetic-Risk Stratification of Patients with Myeloid Neoplasms
Maher Albitar ... James K Mccloskey
Blood | VOL. 138
Maher Albitar, et. al.Maher Albitar ... James K Mccloskey
05 Nov 2021
Blood | VOL. 138

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SNVHMM: predicting single nucleotide variants from next generation sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics