Abstract
RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot.
Highlights
Ribonucleoproteins and RNA-binding proteins (RBPs) are important post-transcriptional regulators in several processes such as, RNA splicing, transport, localization, translation and stabilization
By analyzing the complete proteomes of human and Escherichia coli we demonstrate the ability of the pipeline to process large datasets including bacterial proteomes
The training sets, SwissProt-positive (4779 proteins) and SwissProt-negative (5834 proteins), were analyzed in order to evaluate the ability of the method to accurately differentiate RBPs from non-RBPs
Summary
Ribonucleoproteins and RNA-binding proteins (RBPs) are important post-transcriptional regulators in several processes such as, RNA splicing, transport, localization, translation and stabilization. Such regulatory mechanisms involve brief interactions or stable bindings of regulatory RNAs with RBPs, which are structurally and functionally important for various cellular processes. More than 1000 eukaryotic RBPs have been described to contain conserved amino-acid motifs or RNA-binding domains (RBDs), which serve as RNA binding sites [1,7]. A small number of RBPs lacking known RNA-binding motifs have been identified, which in most cases rely on intrinsically disordered domains for their interaction with RNAs [1]. Numerous structures of protein–RNA complexes have been solved experimentally, providing biophysical information on the interaction between nucleic acids and amino acids
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.