Abstract

BackgroundRestriction Enzyme-based Reduced Representation Library (RRL) method represents a relatively feasible and flexible strategy used for Single Nucleotide Polymorphism (SNP) identification in different species. It has remarkable advantage of reducing the complexity of the genome by orders of magnitude. However, comprehensive evaluation for actual efficacy of SNP identification by this method is still unavailable.ResultsIn order to evaluate the efficacy of Restriction Enzyme-based RRL method, we selected Tsp 45I enzyme which covers 266 Mb flanking region of the enzyme recognition site according to in silico simulation on human reference genome, then we sequenced YH RRL after Tsp 45I treatment and obtained reads of which 80.8% were mapped to target region with an 20-fold average coverage, about 96.8% of target region was covered by at least one read and 257 K SNPs were identified in the region using SOAPsnp software.Compared with whole genome resequencing data, we observed false discovery rate (FDR) of 13.95% and false negative rate (FNR) of 25.90%. The concordance rate of homozygote loci was over 99.8%, but that of heterozygote were only 92.56%. Repeat sequences and bases quality were proved to have a great effect on the accuracy of SNP calling, SNPs in recognition sites contributed evidently to the high FNR and the low concordance rate of heterozygote. Our results indicated that repeat masking and high stringent filter criteria could significantly decrease both FDR and FNR.ConclusionsThis study demonstrates that Restriction Enzyme-based RRL method was effective for SNP identification. The results highlight the important role of bias and the method-derived defects represented in this method and emphasize the special attentions noteworthy.

Highlights

  • Restriction Enzyme-based Reduced Representation Library (RRL) method represents a relatively feasible and flexible strategy used for Single Nucleotide Polymorphism (SNP) identification in different species

  • RRL construction and sequencing In this study we screened nine restriction enzymes with human genome hg18 as reference, we fragmented the whole genome in silico according to the enzyme restriction-site sequence, considering to the fragment size of sequencing platform the target fragments ranging from 200 bp to 700 bp were selected, as large variety of fragment length was not recommended for the cluster generation of the Illumina sequencing platform

  • We calculated the proportion of false discovery and false negative SNPs on repeat regions, the results showed that 84.2% of false discovery rate (FDR) and 42.5% of false negative rate (FNR) were caused by repeat regions, which was consistent with the previous results; the influence of high depth filter could be ignored

Read more

Summary

Introduction

Restriction Enzyme-based Reduced Representation Library (RRL) method represents a relatively feasible and flexible strategy used for Single Nucleotide Polymorphism (SNP) identification in different species. It has remarkable advantage of reducing the complexity of the genome by orders of magnitude. Several genome-wide genotyping technologies have been developed and commercialized, aiming at detecting common SNPs or tagSNPs in parallel [5] (e.g. Illumina BeadArray based on primer extension [6], Affymetrix SNP arrays based on differential hybridization [7] etc.) These technologies have obvious advantages such as low costs, whole genome sequencing (WGS) is the most straightforward method for genome-wide identification of SNPs and other types of variants. Lots of RRL strategies have been proposed and proved, such as target enrichment technologies including multiplex PCR, restriction enzyme digestion, selective sequence capture on array [7] or in solution [8], and others (reviewed by Mamanova L [9])

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call