Mining for Single Nucleotide Polymorphisms in Expressed Sequence Tags of European Sea Bass

E L Souche,R Reinhardt,F A M Volckaert,S Klages,A Canario,J K J Van Houdt,B Hellemans

doi:10.1515/jib-2007-73

Abstract

Abstract As a multitude of sequence data are published, discovering polymorphisms bioinformatically becomes a valid option. In silico Single Nucleotide Polymorphism (SNP) detection is based on the analysis of multiple alignments. Each column of an alignment is considered a slice containing one base of every sequence aligned. If a mismatch is detected, the slice is further analysed and the mismatch may be reported as a candidate SNP.About 30,000 Expressed Sequence Tags (ESTs) of the fish European sea bass have been sequenced and processed. Since ESTs are redundant, they provide a resource for in silico SNP discovery. To prevent the detection of sequencing errors, a redundancy of two is chosen in order for a mismatch to be considered a candidate SNP. Among the various tools available to detect candidate SNPs, three software packages were tested: SNPServer, PolyBayes and PolyFreq. Candidate SNPs were validated in the laboratory by cloning and sequencing. From preliminary results PolyFreq outperforms both PolyBayes and SNPServer in terms of positive predictive value and SNPServer is the most sensitive tool. PolyFreq and SNPServer non-default identify respectively the fewest and highest number of candidates. Considering candidates detected by several tools seems to enhance both positive predictive value and sensitivity. Out of the 69 loci sequenced, only four were monomorphic, leading to a total of 91.3% polymorphic loci. Randomly chosen contigs will be sequenced to know whether SNP discovery tools tend to predict polymorphic fragments. Polymorphisms will be mapped, used for selection in aquaculture and the study of adaptation in natural populations.

Full Text