SELECTION OF A MINIMAL NUMBER OF SIGNIFICANT PORCINE SNPs BY AN INFORMATION GAIN AND GENETIC ALGORITHM HYBRID MODEL

Wanthanee Rathasamuth,Kitsuchart Pasupa,Sissades Tongsima

doi:10.22452/mjcs.sp2019no2.5

Abstract

A panel of a large number of common Single Nucleotide Polymorphisms (SNPs) distributed across an entire porcine genome has been widely used to represent genetic variability of pigs. With the advent of SNP-array technology, a genome-wide genetic profile of a specimen can be easily observed. Among the large number of such variations, there exists a much smaller subset of the SNP panel that could equally be used to correctly identify the corresponding breed. This work presents a SNP selection heuristic that can still be used effectively in the breed classification. The features were selected by combining a filter method and a wrapper methodâ€“information gain method and genetic algorithma“plus a feature frequency selection step, while classification used a support vector machine. We were able to reduce the number of significant SNPs to 0.86 % of the total number of SNPs in a swine dataset with 94.80 % classification accuracy.

Full Text