Abstract

BackgroundHigh-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance.ResultsA white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (PSNP), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either PSNP ≥ 0.95 or ≥ 0.99. A total of 9,310 SNPs were detected by using PSNP ≥ 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17.ConclusionWe integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies.

Highlights

  • High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies

  • PolyBayes uses a priori information about the average pairwise difference between paralogous sequences to calculate a posteriori, the probability that a sequence is native by comparison to a reference sequence from the Expressed Sequence Tags (EST) cluster

  • We estimated Single Nucleotide Polymorphisms (SNP) diversity and distribution parameters in 6,459 contigs each derived from sequences of at least two cDNA clones with the PolyBayes software

Read more

Summary

Introduction

High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. Bayesian statistics were applied to incorporate background information into the specification of a tested model for data analysis [14] They were implemented in the software PolyBayes to determine a confidence score for each SNP detected in a cluster of ESTs [15]. Based on the alignment of the ESTs within the cluster, another Bayesian calculation generates the probability that a variant at a given location of a multiple alignment represents a true polymorphism as opposed to a sequencing error This calculation takes into account the alignment depth, the base calls in each of the sequences, the associated base quality values, the base composition in the region, and the expected a priori rate of polymorphism of the species under investigation. This approach was shown to be adequate for SNP prediction in human [15], sugarcane [16], soybean [17], and pine [18]

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call