Abstract

Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals.

Highlights

  • As farm practices around the world are continuously challenged to minimize environmental footprint, there is a growing need for livestock producers to identify and select superior animals for efficiency-related traits (Hayes et al, 2013)

  • Using a threefold cross-validation scheme for each gene expression dataset of five tissues, we identified different numbers of differentially expressed genes (DEG) by t-test, edgeR, Random Forests (RFs), XGBoost, and RF and XGBoost (RX) (Table 2)

  • When comparing the selected genes by RF, XGBoost, and RX with those from t-test and edgeR, we found that the RF selected almost all DEG identified by t-test and edgeR as well as new genes (91.5 and 93.7% of the genes identified by t-test and edgeR were identified by RF, respectively), while the XGBoost and RX only picked up the top-ranked DEG by t-test and edgeR

Read more

Summary

Introduction

As farm practices around the world are continuously challenged to minimize environmental footprint, there is a growing need for livestock producers to identify and select superior animals for efficiency-related traits (Hayes et al, 2013). Considering that diverse mechanisms are involved in FE regulation, it is often difficult to develop molecular markers that accurately differentiate animals between high FE (HFE) and low FE (LFE), when using a traditional case-control study method. Unlike healthy vs diseased or treated vs non-treated contrasts, differences between HFE and LFE are subtle and often related to intrinsic metabolic processes (Cantalapiedra-Hijar et al, 2018). The development and application of accurate methods to identify predictor molecules of polygenic traits, such as FE, are essential for the implementation of an effective genomic selection program in livestock species

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call