Abstract

To identify untranslated regions (UTR) splice sites more accurately and efficiently, a method for the recognition of UTR splice sites using both splicing sequences and secondary structures of flank sequence information based on combination statistical method with support vector machine was proposed. The method consists of two stages: a statistical method is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The statistical method serves as a pre-processing step for the SVM and takes UTR sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. Then the Mfold package in Vienna soft was used to predict the most stable secondary structure offlank sequences. The traditional four-letter alphabet was converted into eight-letter alphabet sequence. The sequence- structure combination strings were used for training models then recognized splice sites by the well trained models. Using the actual 5'UTR splice dataset of human gene tested the method; it shows a good performance for UTR splice sites recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.