Abstract Advancements in agricultural data collection and methodologies for data analysis offer opportunities for beef cattle producers to improve herd genetics. Genetic evaluation systems (GES) can benefit from modern programming approaches to keep up with these advancements. Python, a fast-developing programming language with scientific computing libraries such as NumPy and SciPy, offers flexibility that allows for quick adoption of superior algorithms and new data sources, as well as easy maintenance of the software over the long-term. This study evaluated the use of Python (v3.12.1) for developing a GES capable of executing all the necessary functions expected of modern implementation of GES. A simulated dataset was generated with the R package AlphaSimR to mimic characteristics of the Angus beef cattle breed. The dataset included pedigree information, phenotypic data, and true breeding values (TBVs) for each animal. Phenotypic traits included birth weight (BWT), weaning gain (WG), and post-weaning gain (PWG), with a total of 976,400 animals simulated across 15 generations. The GES software was developed in house, and included components such as data import, pedigree and data preprocessing, calculating inbreeding, constructing the inverse of the numerator relationship matrix, solver initialization, estimation of breeding value, and writing results. The genetic evaluation employed pedigree-based best linear unbiased prediction (BLUP) and a multiple-trait animal model, using a preconditioned conjugate gradient iteration algorithm to solve the mixed model equations. Convergence was assessed using the residual vector norm divided by the right-hand-side norm, with a threshold to declare convergence set to 1x10-5. The software was run on a Linux server with an Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, 96 GB of RAM, and eight cores. Accuracy was measured using Pearson correlation coefficient between TBVs and estimated breeding values (EBVs). To benchmark computing time and solutions, the same dataset and model were input into a commonly used commercial GES, MiXBLUP (v3.0.1). The solver was the most time-consuming component of the GES, with a run time of 27 min. However, the application of multiprocessing reduced the run time by 72% to 8 min. The Pearson correlation coefficient between EBV and TBV was 0.86, 0.79, and 0.73 (P < 0.01) for BWT, WG, and PWG, respectively. In contrast, MiXBLUP completed breeding value prediction in less than 1 min. The Pearson correlation coefficient between Python and MiXBLUP predictions was 1.0 (P < 0.01). While Python is an ideal language for modern GES, further updates are necessary to optimize performance and operations. Future research will focus on refining computational algorithms, exploring parallel processing techniques, and enhancing user interfaces to ensure easy integration with industry practices.
Read full abstract