Abstract

Multi-category classification methods were used to detect SNP-mortality associations in broilers. The objective was to select a subset of whole genome SNPs associated with chick mortality. This was done by categorizing mortality rates and using a filter-wrapper feature selection procedure in each of the classification methods evaluated. Different numbers of categories (2, 3, 4, 5 and 10) and three classification algorithms (naïve Bayes classifiers, Bayesian networks and neural networks) were compared, using early and late chick mortality rates in low and high hygiene environments. Evaluation of SNPs selected by each classification method was done by predicted residual sum of squares and a significance test-related metric. A naïve Bayes classifier, coupled with discretization into two or three categories generated the SNP subset with greatest predictive ability. Further, an alternative categorization scheme, which used only two extreme portions of the empirical distribution of mortality rates, was considered. This scheme selected SNPs with greater predictive ability than those chosen by the methods described previously. Use of extreme samples seems to enhance the ability of feature selection procedures to select influential SNPs in genetic association studies.

Highlights

  • In genetic association studies of complex traits, assessing many loci jointly may be more informative than testing associations at individual markers

  • Error rates increased with K for each classifier, since the baseline error increased with K; in each instance, classifiers improved upon random classification

  • naïve Bayes (NB) was used for further analysis

Read more

Summary

Introduction

In genetic association studies of complex traits, assessing many loci jointly may be more informative than testing associations at individual markers. The complexity of biological processes underlying a complex trait makes it probable that many loci residing on different chromosomes are involved [1,2]. Standard regression models have problems when fitting effects of a much larger number of SNPs (and, possibly, their interactions) than the number of observations available. To address this difficulty, a reasonable solution could be pre-selection of a small number of SNPs, followed by modeling of associations between these SNPs and the phenotype [4]. Other strategies include stepwise (page number not for citation purposes)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call