Abstract

Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout (Oncorhynchus mykiss), SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL) and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway) that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU) and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1) which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup, followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs) and multi-sequence variants (MSVs). Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25). The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and heterozygosity within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.

Highlights

  • The rainbow trout (Oncorhynchus mykiss) is an iconic salmonid fish species with a remarkably diverse life history, and has a wide interest as a model research organism as well as high economic value for the sport fishing and aquaculture industries

  • Using VCFtools (Danecek et al, 2011) we calculated that the average genome-wide nucleotide diversity π (Nei and Li, 1979) measured in 20 Kb genomic bins was π = 2.3 × 10−3

  • Compared to other studies of wholegenome resequencing in livestock, the Single-nucleotide polymorphisms (SNPs) rate revealed in this study is similar to the genome average rate reported for bovine (Daetwyler et al, 2014), but higher than the rate reported for pigs (Choi et al, 2015) and substantially lower than the rates reported for the chicken genome (Kranis et al, 2013) and Pacific oyster genome (Gutierrez et al, 2017)

Read more

Summary

Introduction

The rainbow trout (Oncorhynchus mykiss) is an iconic salmonid fish species with a remarkably diverse life history, and has a wide interest as a model research organism as well as high economic value for the sport fishing and aquaculture industries. Much effort has been devoted in recent years for developing genomic resources for research in rainbow trout, including a draft genome assembly (Berthelot et al, 2014), a highdensity 57K SNP array (Palti et al, 2015a), a dense genetic linkage map (Gonzalez-Pena et al, 2016), and recently the annotated reference genome sequence (GenBank assembly Accession GCA_002163495, RefSeq assembly accession GCF_002163495). Efforts used targeted single-gene sequencing to discover and characterize a restricted number of Single-nucleotide polymorphisms (SNPs) (AbadÍA-Cardoso et al, 2011) Another approach has been to use genotyping by sequencing methods such as restriction site associated DNA (RAD) markers (e.g., Miller et al, 2012; Palti et al, 2015b), but there have been many technical difficulties in comparing and transferring results across studies. A comprehensive SNP database from genome resequencing data can further improve the design and selection of a new SNP arrays by improving the genome coverage and spacing of the SNPs as well as selecting markers for follow up studies within targeted regions of the genome and for particular populations

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call