We used whole genome resequencing of pooled individuals to develop a high-density single-nucleotide polymorphism (SNP) chip for Eucalyptus. Genomes of 240 trees of 12 species were sequenced at 3.5× each, and 46 997 586 raw SNP variants were subject to multivariable filtering metrics toward a multispecies, genome-wide distributed chip content. Of the 60 904 SNPs on the chip, 59 222 were genotyped and 51 204 were polymorphic across 14 Eucalyptus species, providing a 96% genome-wide coverage with 1 SNP/12-20 kb, and 47 069 SNPs at ≤ 10 kb from 30 444 of the 33 917 genes in the Eucalyptus genome. Given the EUChip60K multi-species genotyping flexibility, we show that both the sample size and taxonomic composition of cluster files impact heterozygous call specificity and sensitivity by benchmarking against 'gold standard' genotypes derived from deeply sequenced individual tree genomes. Thousands of SNPs were shared across species, likely representing ancient variants arisen before the split of these taxa, hinting to a recent eucalypt radiation. We show that the variable SNP filtering constraints allowed coverage of the entire site frequency spectrum, mitigating SNP ascertainment bias. The EUChip60K represents an outstanding tool with which to address population genomics questions in Eucalyptus and to empower genomic selection, GWAS and the broader study of complex trait variation in eucalypts.
Read full abstract