Abstract

Previous research into the recognition of E.coli promoters has focused on the use of raw DNA sequences and alignment methods to find interesting features in the promoter regions. In this paper, we aim to compare the classification accuracy of a neural network trained on DNA sequences encoded using orthogonal representation of the nucleotides, and a set of high level features from the DNA. In addition to this, we evaluate the impact of different types of non-promoters used in training and testing on the classification accuracy. 872 E.coli promoters were used and three types of non-promoters, which included random sequences with the same base frequency as the promoter sequences, genes sequences selected from E.coli and random sequences with the same base frequencies as the gene non-promoters. Raw DNA sequences were encoded using CODE-4 and high level features, which were outlined by previous researchers and subsequently formally defined in this paper. We found that the high level features did not perform as well for promoter recognition compared with CODE-4 DNA representation, contrary to expectation. The strongest determining factor in classification accuracy was the type of non-promoter used for training and testing. Overall non-promoters from coding regions and random sequences with the same base frequency as the gene non-promoter resulted in the best classification accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.