Abstract

Background The identification of genetic variants such as single nucleotide polymorphisms (SNPs) is a critical step in many applications based on NGS technologies [1]. Although many SNP calling programs have been developed, it is still challenging to accurately call SNPs, especially when coverage level is low [2]. Moreover, the determination of SNPs, which is performed through many separate steps, requires a careful selection of a diverse set of tools [3,4]. This can lead to several disadvantages, for example, one cannot incorporate information from the read alignment step into the SNP calling step or vice versa to help improve accuracy of called SNPs. Materials and methods We propose a novel integrated approach to detect more true SNPs while calling fewer false positives. Different from current methods that perform read alignment and SNP calling steps separately, our method combines them methodologically to improve the accuracy of SNP identification. To effectively exploit information from a population of genomes, databases of confirmed SNPs, such as dbSNP, are employed in both aligning reads to references as well as calling SNPs. This strategy allows us to develop a novel algorithm to align reads to references that can differentiate sequencing errors from SNPs. Results Based on this result, the method can call SNPs accurately and effectively even with low-coverage sequencing data. Our results on simulated data show that the method is able to call SNPs with very high precision and recall rate with low-coverage datasets. Conclusions With the existence of databases of confirmed SNPs for large amounts of sequenced species, our approach provides a promising method to call accurate SNP information even with low-coverage sequencing data. This approach can also help researchers facilitate the determination of SNPs by using an integrated SNP calling tool.

Highlights

  • The identification of genetic variants such as single nucleotide polymorphisms (SNPs) is a critical step in many applications based on NGS technologies [1]

  • Materials and methods We propose a novel integrated approach to detect more true SNPs while calling fewer false positives

  • Different from current methods that perform read alignment and SNP calling steps separately, our method combines them methodologically to improve the accuracy of SNP identification

Read more

Summary

Introduction

The identification of genetic variants such as single nucleotide polymorphisms (SNPs) is a critical step in many applications based on NGS technologies [1]. Materials and methods We propose a novel integrated approach to detect more true SNPs while calling fewer false positives.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call