Selecting predictive features for recognition of hypersensitive sites of regulatory genomic sequences with an evolutionary algorithm

Uday Kamath,Amarda Shehu,Kenneth A De Jong

doi:10.1145/1830483.1830516

Abstract

This paper proposes a method to improve the recognition of regulatory genomic sequences. Annotating sequences that regulate gene transcription is an emerging challenge in genomics research. Identifying regulatory sequences promises to reveal underlying reasons for phenotypic differences among cells and for diseases associated with pathologies in protein expression. Computational approaches have been limited by the scarcity of experimentally-known features specific to regulatory sequences. High-throughput experimental technology is finally revealing a wealth of hypersensitive (HS) sequences that are reliable markers of regulatory sequences and currently the focus of classification methods. The contribution of this paper is a novel method that combines evolutionary computation and SVM classification to improve the recognition of HS sequences. Based on experimental evidence that HS regions employ sequence features to interact with enzymes, the method seeks motifs to discriminate between HS and non-HS sequences. An evolutionary algorithm (EA) searches the space of sequences of different lengths to obtain such motifs. Experiments reveal that these motifs improve recognition of HS sequences by more than 10% compared to state-of-the-art classification methods. Analysis of these motifs reveals interesting insight into features employed by regulatory sequences to interact with DNA-binding enzymes.

Full Text