Abstract

Next-generation sequencing (NGS) technology is being applied to an increasing number of non-model species and has been used as the primary approach for accurate genotyping in genetic and evolutionary studies. However, inferring genotypes from sequencing data is challenging, particularly for organisms with a high degree of heterozygosity. This is because genotype calls from sequencing data are often inaccurate due to low sequencing coverage, and if this is not accounted for, genotype uncertainty can lead to serious bias in downstream analyses, such as quantitative trait locus mapping and genome-wide association studies. Here, we used high-coverage reference data sets from Crassostrea gigas to simulate sequencing data with different coverage, and we evaluate the influence of genotype calling rate and accuracy as a function of coverage. Having initially identified the appropriate parameter settings for filtering to ensure genotype accuracy, we used two different single-nucleotide polymorphism (SNP) calling pipelines, single-sample and multi-sample. We found that a coverage of 15× was suitable for obtaining sufficient numbers of SNPs with high accuracy. Our work provides guidelines for the selection of sequence coverage when using NGS to investigate species with a high degree of heterozygosity and rapid decay of linkage disequilibrium.

Highlights

  • Traits, in particular, the accuracy of genotyping SNP sites has a significant influence on the results obtained

  • This is because, compared with the low-coverage sequencing for an inbred line of rice, which could be considered as a diploid, it is very difficult to identify the genotypes for heterozygous SNP sites with very low covered reads

  • Using two different SNP calling pipelines, we found that a coverage of 15×was suitable for NGS analysis to obtain a sufficient number of SNPs called with high accuracy

Read more

Summary

Introduction

Traits, in particular, the accuracy of genotyping SNP sites has a significant influence on the results obtained. For some non-model organisms, higher coverage sequencing for a large population would be expensive and beyond the means of many researchers Another approach, low-coverage sequencing assisted by genotype imputation methods, is a possible solution for population studies of some organisms. The aforementioned methods have not been successfully applied to a large group of organisms with a high degree of heterozygosity and rapid decline in linkage disequilibrium (LD). This is because, compared with the low-coverage sequencing for an inbred line of rice, which could be considered as a diploid, it is very difficult to identify the genotypes for heterozygous SNP sites with very low covered reads. On the basis of our findings, we provide guidelines for the selection of sequencing coverage for NGS application to organisms with a high degree of heterozygosity

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.