Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences

Uday Kamath,Kenneth A De Jong,Amarda Shehu

doi:10.1007/978-3-642-32615-8_23

Abstract

AbstractThe annotation of DNA regions that regulate gene transcription is the first step towards understanding phenotypical differences among cells and many diseases. Hypersensitive (HS) sites are reliable markers of regulatory regions. Mapping HS sites is the focus of many statistical learning techniques that employ Support Vector Machines (SVM) to classify a DNA sequence as HS or non-HS. The contribution of this paper is a novel methodology inspired by biological evolution to automate the basic steps in SVM and improve classification accuracy. First, an evolutionary algorithm designs optimal sequence motifs used to associate feature vectors with the input sequences. Second, a genetic programming algorithm designs optimal kernel functions that map the feature vectors into a high-dimensional space where the vectors can be optimally separated into the HS and non-HS classes. Results show that the employment of evolutionary computation techniques improves classification accuracy and promises to automate the analysis of biological sequences.KeywordsDNase I hypersensitive sitesevolutionary algorithmssupport vector machinesgenetic programmingkernel functionsmotifs

Full Text