Abstract

Regulation of pre-mRNA splicing is achieved through the interaction of RNA sequence elements and a variety of RNA-splicing related proteins (splicing factors). The splicing machinery in humans is not yet fully elucidated, partly because splicing factors in humans have not been exhaustively identified. Furthermore, experimental methods for splicing factor identification are time-consuming and lab-intensive. Although many computational methods have been proposed for the identification of RNA-binding proteins, there exists no development that focuses on the identification of RNA-splicing related proteins so far. Therefore, we are motivated to design a method that focuses on the identification of human splicing factors using experimentally verified splicing factors. The investigation of amino acid composition reveals that there are remarkable differences between splicing factors and non-splicing proteins. A support vector machine (SVM) is utilized to construct a predictive model, and the five-fold cross-validation evaluation indicates that the SVM model trained with amino acid composition could provide a promising accuracy (80.22%). Another basic feature, amino acid dipeptide composition, is also examined to yield a similar predictive performance to amino acid composition. In addition, this work presents that the incorporation of evolutionary information and domain information could improve the predictive performance. The constructed models have been demonstrated to effectively classify (73.65% accuracy) an independent data set of human splicing factors. The result of independent testing indicates that in silico identification could be a feasible means of conducting preliminary analyses of splicing factors and significantly reducing the number of potential targets that require further in vivo or in vitro confirmation.

Highlights

  • Alternative splicing (AS), in eukaryotes, is one of the mechanisms of post-transcriptional regulation that generate multiple transcripts from the same gene

  • In order to examine the effectiveness of amino acid composition in identifying splicing factors, an support vector machine (SVM) model is trained using a 20-dimensional vector consisting of the composition scores for twenty amino acids

  • Conclusion the importance of splicing factors has been indicated in pre-messenger RNAs (mRNAs) splicing and alternatively splicing, in vivo or in vitro identification of splicing factors are subject to technical limitations

Read more

Summary

Introduction

Alternative splicing (AS), in eukaryotes, is one of the mechanisms of post-transcriptional regulation that generate multiple transcripts from the same gene. These transcripts are translated into multiple proteins having diverse biological functions. According to the comparative alignment of EST sequences and high-throughput biotechnology techniques such as exon/exon-junction array and RNA-Seq, it has been revealed that most genes (larger than 90%) undergo alternative splicing in humans [1,2,3,4]. Alternative splicing is regulated by splicing factors (SF) that recognize and associate with specific RNA sequence elements in order to enhance or repress the ability of the spliceosome to recognize nearby splice sites [5,6]. Cancer cells often take advantage of this flexibility to produce proteins that promote growth and survival [13]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.