Abstract

BackgroundHepatitis B virus (HBV) quasispecies are crucial in the pathogenesis of chronic liver disease. Next-generation sequencing (NGS) is powerful for identifying viral quasispecies. To improve mapping quality and single nucleotide variant (SNV) calling accuracy in the NGS analysis of HBV, we compared different mapping references, including the sample-specific reference sequence, same genotype sequences and different genotype sequences, according to the sample.MethodsReal Illumina HBV datasets from 86 patients, and simulated datasets from 158 HBV strains in the GenBank database, were used to assess mapping quality. SNV calling accuracy was evaluated using different mapping references to align Real Illumina datasets from a single HBV clone.ResultsUsing the sample-specific reference sequence as a mapping reference produced the largest number of mappable reads and coverages. With a different genotype mapping reference, the consensus sequence derived from the Real Illumina datasets of the single HBV clone showed 21 false SNV callings in polymerase and surface genes, the regions most divergent between the mapping reference and this HBV clone. A ~6 % coverage of most of these false SNVs was yielded even with a same genotype mapping reference, but none with the sample-specific reference sequence.ConclusionsUsing sample-specific reference sequences as a mapping reference in NGS analysis optimized mapping quality and the SNV calling accuracy for HBV quasispecies.Electronic supplementary materialThe online version of this article (doi:10.1007/s12072-015-9645-x) contains supplementary material, which is available to authorized users.

Highlights

  • Generation sequencing (NGS), known as ultrahigh throughput sequencing, is a powerful tool for discovering diseases with novel mutations and for detecting traces of pathogenic microorganisms [1, 2]

  • The sample-specific reference sequence had the best quality, followed by the Taiwanese strain with the same genotype, the Asian strain with the Datasets aligned to different genotypes yielded false single nucleotide variant (SNV) in the consensus sequence derived from Nextgeneration sequencing (NGS) reads of a single Hepatitis B virus (HBV) clone

  • We used the Illumina Hiseq 2500 system to analyze HBV full genome in viral quasispecies; it has the advantages of short run times, long read lengths, and high data quality

Read more

Summary

Introduction

Generation sequencing (NGS), known as ultrahigh throughput sequencing, is a powerful tool for discovering diseases with novel mutations and for detecting traces of pathogenic microorganisms [1, 2]. It has been used for sequencing human and microbial genomes and for identifying species. B virus (HBV) quasispecies are crucial in the pathogenesis of chronic liver disease. Nextgeneration sequencing (NGS) is powerful for identifying viral quasispecies. To improve mapping quality and single nucleotide variant (SNV) calling accuracy in the NGS analysis of HBV, we compared different mapping references, including the sample-specific reference sequence, same genotype sequences and different genotype sequences, according to the sample.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call