Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

Andreia J Amaral,Bert Dibbits,Hendrik-Jan Megens,Martien Am Groenen,Hindrik Hd Kerstens,Richard Pma Crooijmans,Henri Cm Heuven,Johan T Den Dunnen

doi:10.1186/1471-2164-10-374

Andreia J Amaral, Bert Dibbits + Show 6 more

Open Access

https://doi.org/10.1186/1471-2164-10-374

Copy DOI

Abstract

BackgroundAlthough the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale.ResultsDNA pooled from five animals from a commercial boar line was digested with DraI; 150–250-bp fragments were isolated and end-sequenced using the Illumina 1 G Genome Analyzer, yielding 70,348,064 sequences 36-bp long. Rules were developed to select sequences, which were then aligned to unique positions in a reference genome. Sequences were selected based on quality, and three thresholds of sequence quality (SQ) were compared. The highest threshold of SQ allowed identification of a larger number of SNPs (17,489), distributed widely across the pig genome. In total, 3,142 SNPs were validated with a success rate of 96%. The correlation between estimated minor allele frequency (MAF) and genotyped MAF was moderate, and SNPs were highly polymorphic in other pig breeds. Lowering the SQ threshold and maintaining the same criteria for SNP identification resulted in the discovery of fewer SNPs (16,768), of which 259 were not identified using higher SQ levels. Validation of SNPs found exclusively in the lower SQ threshold had a success rate of 94% and a low correlation between estimated MAF and genotyped MAF. Base change analysis suggested that the rate of transitions in the pig genome is likely to be similar to that observed in humans. Chromosome X showed reduced nucleotide diversity relative to autosomes, as observed for other species.ConclusionLarge numbers of SNPs can be identified reliably by creating strict rules for sequence selection, which simultaneously decreases sequence ambiguity. Selection of sequences using a higher SQ threshold leads to more reliable identification of SNPs. Lower SQ thresholds can be used to guarantee sufficient sequence coverage, resulting in high success rate but less reliable MAF estimation. Nucleotide diversity varies between porcine chromosomes, with the X chromosome showing less variation as observed in other species.

Highlights

The Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality
Because millions of fragments are sequenced in parallel, a fragment can be sequenced even if it exists in low abundance in the sample, thereby increasing sequencing depth and enabling identification of single nucleotide polymorphisms (SNPs) with high accuracy [7,8,9]
Sequencing and filtering the representation libraries (RRLs) An RRL was produced from a DNA pool of five boars from a crossbred (Large White vs. Pietrain) commercial boar line (PW), using the restriction enzyme DraI, which recognizes pattern "TTTAAA" and generates blunt-ended fragments starting with AAA

Summary

Introduction

The Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. The Illumina 1 G Genome Analyzer (ILLUMINA, San Diego, CA, USA) uses a sequencing by synthesis method, during which millions of DNA fragments are sequenced in parallel (massive parallel sequencing). With this method, costly and often problematic procedures, such as cloning are eliminated. Costly and often problematic procedures, such as cloning are eliminated Another advantage is that accuracy is independent of sequence context because a discrete signal is generated per each base. This method is very accurate in cases of homopolymeric sequences and generates quality values that are analogous to Phred scores [5]. Because millions of fragments are sequenced in parallel, a fragment can be sequenced even if it exists in low abundance in the sample, thereby increasing sequencing depth and enabling identification of single nucleotide polymorphisms (SNPs) with high accuracy [7,8,9]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 1, 2009
Citations: 91	License type: cc-by

R Discovery Prime

R Discovery Prime

Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

A comprehensive atlas of nuclear sequences of mitochondrial origin (NUMT) inserted into the pig genome
Matteo Bolner ... Luca Fontanesi
Genetics Selection Evolution | VOL. 56
Matteo Bolner, et. al.Matteo Bolner ... Luca Fontanesi
16 Sep 2024
Genetics Selection Evolution | VOL. 56

Intra-host SARS-CoV-2 single-nucleotide variants emerged during the early stage of COVID-19 pandemic forecast population fixing mutations
Yi Zhang ... Wenhong Zhang
Journal of Infection | VOL. 84
Yi Zhang, et. al.Yi Zhang ... Wenhong Zhang
15 Jan 2022
Journal of Infection | VOL. 84

High resolution physical map of porcine chromosome 7 QTL region and comparative mapping of this region among vertebrate genomes
Julie Demars ... Katia Feve
BMC Genomics | VOL. 7
Julie Demars, et. al.Julie Demars ... Katia Feve
24 Jan 2006
BMC Genomics | VOL. 7

RICERCA E VALIDAZIONE DI SNP IN GENI CANDIDATI PER LA QUALITÀ DELLA CARNE E APPLICAZIONE DELL'ANALISI GENOMICA ALLA SPECIE SUINA

-

02 Dec 2010
02 Dec 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics