Abstract

Bovine viral diarrhea virus (BVDV) causes one of the most economically important diseases in cattle, and the virus is found worldwide. A better understanding of the disease associated factors is a crucial step towards the definition of strategies for control and eradication. In this study we trained a random forest (RF) prediction model and performed variable importance analysis to identify factors associated with BVDV occurrence. In addition, we assessed the influence of features selection on RF performance and evaluated its predictive power relative to other popular classifiers and to logistic regression. We found that RF classification model resulted in an average error rate of 32.03% for the negative class (negative for BVDV) and 36.78% for the positive class (positive for BVDV).The RF model presented area under the ROC curve equal to 0.702. Variable importance analysis revealed that important predictors of BVDV occurrence were: a) who inseminates the animals, b) number of neighboring farms that have cattle and c) rectal palpation performed routinely. Our results suggest that the use of machine learning algorithms, especially RF, is a promising methodology for the analysis of cross-sectional studies, presenting a satisfactory predictive power and the ability to identify predictors that represent potential risk factors for BVDV investigation. We examined classical predictors and found some new and hard to control practices that may lead to the spread of this disease within and among farms, mainly regarding poor or neglected reproduction management, which should be considered for disease control and eradication.Electronic supplementary materialThe online version of this article (doi:10.1186/s13567-015-0219-7) contains supplementary material, which is available to authorized users.

Highlights

  • Bovine viral diarrhea virus (BVDV) has a single-stranded, positive-sense RNA genome and belongs to the genus Pestivirus of the family Flaviviridae [1], causing one of the most common and economically important viral diseases of cattle [2]

  • We compared the distribution of sensitivity and specificity metrics across all repetitions of cross-validation following the same methodology, and we found that Support Vector Machine (SVM) has better specificity performance than random forest (RF) and Gradient Boosting Machine (GBM) (P-value < 0.05), while both RF and GBM outperform SVM in terms of sensitivity (P-value < 0.05)

  • In this study, we trained a RF model based on crosssectional data derived from an investigation for BVDV prevalence carried in Southern Brazil, aiming to identify important predictors for disease occurrence and to evaluate the predictive power of this machine learning model in this specific domain

Read more

Summary

Introduction

Bovine viral diarrhea virus (BVDV) has a single-stranded, positive-sense RNA genome and belongs to the genus Pestivirus of the family Flaviviridae [1], causing one of the most common and economically important viral diseases of cattle [2]. A number of studies based on traditional risk factors identification approaches (logistic regression mainly) have been performed on BVDV [4,5,6,7,8], and the knowledge about major risk factors are related to the following: biosecurity [6], reproduction management [2,6,9,10], herd size [5,8], animal introduction [2,4,5,11], direct contact with other. The random forest (RF) algorithm [18] has been regarded as one of the most precise prediction methods, having advantages such as ability to determine variable importance, ability to model complex interactions among independent variables, and flexibility to perform several types of statistical data analysis, including regression, classification and unsupervised learning [19]. Its high predictive power has been supported by previous comparative studies with other ML methods [21,22,23,24,25]

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.