Abstract

We describe a promoter recognition method named PCA-HPR to locate eukaryotic promoter regions and predict transcription start sites (TSSs). We computed codon (3-mer) and pentamer (5-mer) frequencies and created codon and pentamer frequency feature matrices to extract informative and discriminative features for effective classification. Principal component analysis (PCA) is applied to the feature matrices and a subset of principal components (PCs) are selected for classification. Our system uses three neural network classifiers to distinguish promoters versus exons, promoters versus introns, and promoters versus 3' un-translated region (3'UTR). We compared PCA-HPR with three well-known existing promoter prediction systems such as DragonGSF, Eponine and FirstEF. Validation shows that PCA-HPR achieves the best performance with three test sets for all the four predictive systems.

Highlights

  • Eukaryotic promoter prediction plays a very important role in the study of gene regulation

  • [12], are regarded as one of the most important signal pentamers are ranked, and the analysis result is shown in features in promoter recognition

  • Among the 100 and PromoterExplorer [7] embed this signal feature in their features with highest R value, we found 30% to 40% of prediction system

Read more

Summary

Background

Eukaryotic promoter prediction plays a very important role in the study of gene regulation. Available promoter prediction systems use two types of features for classification namely, context features like nmers, and signal features such as TATA-box, CCAAT-box, and CpG islands. There is a common problem in these prediction systems and they select limited number of features for classification. They ignore information in abandoned features and the interaction of selected features. Feature vectors need to be rebuilt to include more information for classification to achieve better prediction results. Human 3’UTR sequences are from the UTR database [11]

A DNA sequence contains four types of nucleotides
Discussion
Conclusion:
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.