Abstract Recent advancements in genotyping technologies have revolutionized our ability to estimate breed composition in pigs. The classic compositional regression used for this purpose requires a multibreed panel to detect crossbreeding and its application is limited to breed in the panel. In this study, we presented a protocol based on classic anomaly detection algorithms that use single breed panels to detect crossbreeding. By utilizing these methods, we aim to identify and assess breed outliers within large pig genotype panels. The dataset consisted of 44,616 SNP genotypes recorded in purebred Large White, Landrace, Duroc, Pietrain (n = 2,000 from each breed) and 1,010 crossbred pigs (50% Pietrain, 25% Large White, 25% Landrace). Large White pigs (n = 1,500) were randomly selected as reference set. The remaining 500 Large White pigs, other purebred pigs, crossbred pigs, and simulated crossbred animals of known proportion of Large White segments were test sets. We applied two anomaly detection methods: 1) Principal Component Analysis (PCA) and 2) deep learning autoencoders (AE). For PCA the top 600 eigen vectors capturing 85% cumulative proportion of variance in the reference set were selected. Test set genotypes were projected onto the principal components and then they were reconstructed to the full genotype set. For AE, we used a previously published sparse convolutional denoising autoencoder (SCDA) that consisted of two main components: an encoder and a decoder. The encoder compressed the genotypes to 128 components and the decoder reconstructed the set to the original 44,616 SNP. We utilized the model trained in the reference set to reconstruct test set genotypes of the same breed, of alternative breed and of crossbred. Accuracy of reconstruction of genotypes was computed using correlation and mean square error. The proportion of rejected genotypes was used to assess the ability of anomaly detection algorithms to detect crossbreeding. For each criterion, the threshold to reject genotypes was selected as the value at which 5% of Large White animals were rejected (specificity = 0.95). Results are summarized in (Table 1). PCA provided better properties to detect genomic outliers compared with AE. PCA anomaly detection was able to detect purebred individuals with probability 1.0. It also detected over 90% of crossbred individuals that were 75% Large Whites. Crossbred with a lower proportion of Large White genomic segments were detected with P < 0.5. Anomaly detection algorithms can be used to detect genomic outliers but with relatively low sensitivity. More work is needed to optimize these algorithms for breeding applications.
Read full abstract