The present study evaluated three strategies to find the optimum subset of DNA markers from the 50 K Illumina Bovine panel to classify beef cattle into the most and the least feed-efficient groups without using individual feed intake and performance measures. Residual feed intake (RFI) and 50 K single nucleotide polymorphisms (SNPs) genotype data of 4,057 beef animals from research and commercial herds were included. Initially, all cattle were ranked based on their phenotypic RFI values. Then different datasets were created by selecting animals from the 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, and 45% range of top and bottom of the ranked RFI values. SNP subsets were selected based on the top-ranked SNPs contributing to the variance of RFI (first strategy), selecting SNPs from the SNP subsets created in the first strategy (strategy 2), and extracting SNPs from 50k SNPs (strategy 3). Then eleven ML algorithms were employed to classify the most and the least feed-efficient groups using 260 datasets generated by combinations of ten RFI phenotype percentage groups and 6, 18, and 2 SNP subsets in the first, second and third strategies, respectively. There was a high degree of accuracy (>69%) for classifying animals in the range of 1% for all ML algorithms under the three strategies and different SNP subsets. Implementing the linear Support Vector Machine algorithm for 15 K SNPs obtained in the first strategy predicted the 1% of the most and the least feed-efficient animals with an accuracy of 84%. In the second strategy, selecting 524 SNPs from the 15 K SNPs subset outperformed the other strategies with an accuracy of 81% for 1% of the population using the Naive Bayes algorithm. It was concluded that a smaller number of SNPs (524) could be used to predict the most and the least feed-efficient animals with an acceptable accuracy to reduce the cost of selection for RFI using genomic information.
Read full abstract