Abstract

Promoters are short regulatory DNA sequences located upstream of a gene. Structural analysis of promoter sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the structural analysis of promoters a complex problem. Grammar rules can be used for describing the structure of promoter sequences; however, derivation of such rules is not trivial. In this paper, stochastic L-grammar rules are derived automatically from known drosophila and vertebrate promoter and non-promoter sequences using genetic programming. The fitness of grammar rules is evaluated using a machine learning technique, called Support Vector Machine (SVM). SVM is trained on the known promoter sequences to obtain a discriminating function which serves as a means of evaluating a candidate grammar (a set of rules) by determining the percentage of generated sequences that are classified correctly. The combination of SVM and grammar rule inference can mitigate the lack of structural insight in machine learning approaches such as SVM.KeywordsSupport Vector MachinePromoter SequenceSupport Vector Machine ClassifierProduction RuleSupport Vector Machine ClassificationThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.