Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Computational methods for signal peptide prediction: From statistical models to deep learning.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Computational methods for signal peptide prediction: From statistical models to deep learning.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.1186/1748-7188-3-3
A scoring matrix approach to detecting miRNA target sites
  • Mar 31, 2008
  • Algorithms for Molecular Biology : AMB
  • Simon Moxon + 2 more

BackgroundExperimental identification of microRNA (miRNA) targets is a difficult and time consuming process. As a consequence several computational prediction methods have been devised in order to predict targets for follow up experimental validation. Current computational target prediction methods use only the miRNA sequence as input. With an increasing number of experimentally validated targets becoming available, utilising this additional information in the search for further targets may help to improve the specificity of computational methods for target site prediction.ResultsWe introduce a generic target prediction method, the Stacking Binding Matrix (SBM) that uses both information about the miRNA as well as experimentally validated target sequences in the search for candidate target sequences. We demonstrate the utility of our method by applying it to both animal and plant data sets and compare it with miRanda, a commonly used target prediction method.ConclusionWe show that SBM can be applied to target prediction in both plants and animals and performs well in terms of sensitivity and specificity. Open source code implementing the SBM method, together with documentation and examples are freely available for download from the address in the Availability and Requirements section.

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/icmla.2005.24
Discriminating Transmembrane Proteins From Signal Peptides Using SVM-Fisher Approach
  • Dec 15, 2005
  • R.Y Kahsay + 2 more

Most computational methods for transmembrane protein topology prediction rely on compositional bias of amino acids to locate those hydrophobic domains in transmembrane proteins. Because signal peptides also contain hydrophobic segments, these computational prediction methods often misidentify signal peptides as transmembrane proteins. Here, we present a new approach that combines the SVM-Fisher discrimination method and TMMOD - a hidden Markov model based predictor for transmembrane proteins. While TMMOD alone has already outperformed most existing methods in both identification and topology prediction, this new approach further improves the ability of TMMOD to discriminate between transmembrane proteins and signal peptide containing proteins, reducing mis-prediction of signal peptides by more than 30% in our test data.

  • Research Article
  • Cite Count Icon 96
  • 10.1038/msb.2009.64
Global analysis of the glycoproteome in Saccharomyces cerevisiae reveals new roles for protein glycosylation in eukaryotes
  • Jan 1, 2009
  • Molecular Systems Biology
  • Li A Kung + 5 more

To further understand the roles of protein glycosylation in eukaryotes, we globally identified glycan-containing proteins in yeast. A fluorescent lectin binding assay was developed and used to screen protein microarrays containing over 5000 proteins purified from yeast. A total of 534 yeast proteins were identified that bound either Concanavalin A (ConA) or Wheat-Germ Agglutinin (WGA); 406 of them were novel. Among the novel glycoproteins, 45 were validated by mobility shift upon treatment with EndoH and PNGase F, thereby extending the number of validated yeast glycoproteins to 350. In addition to many components of the secretory pathway, we identified other types of proteins, such as transcription factors and mitochondrial proteins. To further explore the role of glycosylation in mitochondrial function, the localization of four mitochondrial proteins was examined in the presence and absence of tunicamycin, an inhibitor of N-linked protein glycosylation. For two proteins, localization to the mitochondria is diminished upon tunicamycin treatment, indicating that protein glycosylation is important for protein function. Overall, our studies greatly extend our understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein glycosylation in mitochondrial protein function and localization.

  • Research Article
  • Cite Count Icon 135
  • 10.1074/jbc.m610507200
Export Pathway Selectivity of Escherichia coli Twin Arginine Translocation Signal Peptides
  • Mar 1, 2007
  • The Journal of biological chemistry
  • Danielle Tullman-Ercek + 6 more

The Escherichia coli genome encodes at least 29 putative signal peptides containing a twin arginine motif characteristic of proteins exported via the twin arginine translocation (Tat) pathway. Fusions of the putative Tat signal peptides plus six to eight amino acids of the mature proteins to three reporter proteins (short-lived green fluorescent protein, maltose-binding protein (MBP), and alkaline phosphatase) and also data from the cell localization of epitope-tagged full-length proteins were employed to determine the ability of the 29 signal peptides to direct export through the Tat pathway, through the general secretory pathway (Sec), or through both. 27/29 putative signal peptides could export one or more reporter proteins through Tat. Of these, 11 signal peptides displayed Tat specificity in that they could not direct the export of Sec-only reporter proteins. The rest (16/27) were promiscuous and were capable of directing export of the appropriate reporter either via Tat (green fluorescent protein, MBP) or via Sec (PhoA, MBP). Mutations that conferred a >or=+1 charge to the N terminus of the mature protein abolished or drastically reduced routing through the Sec pathway without affecting the ability to export via the Tat pathway. These experiments demonstrate that the charge of the mature protein N terminus affects export promiscuity, independent of the effect of the folding state of the mature protein.

  • Research Article
  • Cite Count Icon 19
  • 10.1021/jacs.8b10806
Amphiphile-Mediated Depalmitoylation of Proteins in Living Cells.
  • Dec 5, 2018
  • Journal of the American Chemical Society
  • Andrew K Rudd + 2 more

Post-translational S-palmitoylation plays a central role in protein localization, trafficking, stability, aggregation, and cell signaling. Dysregulation of palmitoylation pathways in cells can alter protein function and is the cause of several diseases. Considering the biological and clinical importance of S-palmitoylation, tools for direct, in vivo modulation of this lipid modification would be extremely valuable. Here, we describe a method for the cleavage of native S-palmitoyl groups from proteins in living cells. Using a cell permeable, cysteine-functionalized amphiphile, we demonstrate the direct depalmitoylation of cellular proteins. We show that amphiphile-mediated depalmitoylation (AMD) can effectively cleave S-palmitoyl groups from the native GTPase HRas and successfully depalmitoylate mislocalized proteins in an infantile neuronal ceroid lipofuscinosis (INCL) disease model. AMD enables direct and facile depalmitoylation of proteins in live cells and has potential therapeutic applications for diseases such as INCL, where native protein thioesterase activity is deficient.

  • Research Article
  • Cite Count Icon 6
  • 10.1002/cppb.20119
Selective Enrichment Coupled with Proteomics to Identify S-Acylated Plasma Membrane Proteins in Arabidopsis.
  • Sep 25, 2020
  • Current Protocols in Plant Biology
  • Lijuan Zhou + 3 more

Protein S-acylation, predominately in the form of palmitoylation, is a reversible lipid post-translational modification on cysteines that plays important roles in protein localization, trafficking, activity, and complex assembly. The functions and regulatory mechanisms of S-acylation have been extensively studied in mammals owing to remarkable development of high-resolution proteomics and the discovery of the S-acylation-related enzymes. However, the advancement of S-acylation studies in plants lags behind that in mammals, mainly due to the lack of knowledge about proteins responsible for this process, such as protein acyltransferases and their substrates. In this article, a set of systematic protocols to study global S-acylation in Arabidopsis seedlings is described. The procedures are presented in detail, including preparation of Arabidopsis seedlings, enrichment of plasma membrane (PM) proteins, ensuing enrichment of S-acylated proteins/peptides based on the acyl-biotin exchange method, and large-scale identification of S-acylated proteins/peptides via mass spectrometry. This approach enables researchers to study S-acylation of PM proteins in plants in a systematic and straightforward way. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Preparation of Arabidopsis seedling materials Basic Protocol 2: Isolation and enrichment of plasma membrane proteins Support Protocol 1: Determination of protein concentration using BCA assay Basic Protocol 3: Enrichment of S-acylated proteins by acyl-biotin exchange method Support Protocol 2: Protein precipitation by methanol/chloroform method Basic Protocol 4: Trypsin digestion and proteomic analysis Alternate Protocol: Pre-resin digestion and peptide-level enrichment.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.compbiomed.2025.110788
Precision oncology informatics for anticancer drug combination responses: A systematic review.
  • Sep 1, 2025
  • Computers in biology and medicine
  • Saifur Rahaman + 7 more

Precision oncology informatics for anticancer drug combination responses: A systematic review.

  • Research Article
  • 10.1016/j.bbrep.2025.101980
The aggregation tendencies of the signal peptide regions of prone and not prone to aggregate proteins.
  • Jun 1, 2025
  • Biochemistry and biophysics reports
  • Natalie G Horgan + 3 more

The aggregation tendencies of the signal peptide regions of prone and not prone to aggregate proteins.

  • Research Article
  • 10.1007/978-1-0716-4662-5_13
Comprehensive Prediction of Protein Localization and Signal Peptides Using MULocDeep.
  • Jan 1, 2025
  • Methods in molecular biology (Clifton, N.J.)
  • Lei Jiang + 4 more

Localization of a protein within a cell encompasses various processes and signaling events, including guidance of signal peptides. Accurate prediction of subcellular and suborganellar protein localization, as well as signal peptides, is crucial for understanding protein function and provides valuable insights into cellular mechanisms. Although many computational methods can predict either general protein localization, suborganellar localization, or signal peptides, the lack of comprehensive and intuitive interpretation, insufficient coverage of localization types, and issues related to ease of use are some common limitations. In this chapter, we introduce MULocDeep, an advanced web server designed for the prediction of protein localization at both subcellular and suborganellar levels, as well as the identification of signal peptides and their corresponding cleavage sites. This web server integrates a sophisticated protein large language model, enabling highly accurate predictions and facilitating the interpretation of results. The server also includes multiple interactive interfaces that enhance the clarity and accessibility of predictions, particularly concerning motif patterns within protein sequences. Furthermore, we demonstrate the practical functionality of the MULocDeep web server, providing detailed instructions on how to utilize the server and interpret the results for both localization and signal peptide prediction. The MULocDeep web server is publicly available at https://www.mu-loc.org/ .

  • Dissertation
  • 10.33612/diss.150510580
Bacterial protein sorting: experimental and computational approaches
  • Dec 16, 2020
  • Stefano Grasso

All living cells are subdivided into different compartments that are separated by membranes, which are essentially impermeable to water-soluble molecules. Since proteins are predominantly synthesized in the cytoplasm, specific sorting mechanisms and signals known as signal peptides are required to target them to other cellular compartments or the extracellular milieu. The research presented in this PhD thesis was focused on principles of protein sorting and secretion in bacteria, which were investigated with novel approaches that combined experimental analyses and computational tools. In particular, the studies addressed the bacterial cell factory Bacillus subtilis and the pathogens Porphyromonas gingivalis and Staphylococcus aureus. The results show how computational approaches can greatly enhance the experimental studies. In particular, this concerned predictions of subcellular protein localization with tailored tools that were developed for the different bacteria. These tools can enhance industrial and domestic applications of proteins produced with bacteria, or foster the identification of novel drugs or drug targets. In addition, the relationship between secretion efficiency and different features of signal peptides was investigated using a designed signal peptide library and an innovative high-throughput assay. The outcomes were used to generate a machine learning model that predicts signal peptide efficiency in directing protein secretion, and explains the relevant physico-chemical features of signal peptides. Importantly, the model allows de novo design of signal peptides that can be exploited in high-performing protein secretion systems. Altogether, the studies highlight the advantages of combined computational-experimental approaches and how they are best exploited in future biotechnological, pharmaceutical and biomedical applications.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 28
  • 10.1074/jbc.m113.526426
Determinants of RNA Binding and Translational Repression by the Bicaudal-C Regulatory Protein
  • Mar 1, 2014
  • Journal of Biological Chemistry
  • Yan Zhang + 3 more

Bicaudal-C (Bic-C) RNA binding proteins function as important translational repressors in multiple biological contexts within metazoans. However, their RNA binding sites are unknown. We recently demonstrated that Bic-C functions in spatially regulated translational repression of the xCR1 mRNA during Xenopus development. This repression contributes to normal development by confining the xCR1 protein, a regulator of key signaling pathways, to specific cells of the embryo. In this report, we combined biochemical approaches with in vivo mRNA reporter assays to define the minimal Bic-C target site within the xCR1 mRNA. This 32-nucleotide Bic-C target site is predicted to fold into a stem-loop secondary structure. Mutational analyses provided evidence that this stem-loop structure is important for Bic-C binding. The Bic-C target site was sufficient for Bic-C mediated repression in vivo. Thus, we describe the first RNA binding site for a Bic-C protein. This identification provides an important step toward understanding the mechanisms by which evolutionarily conserved Bic-C proteins control cellular function in metazoans.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 33
  • 10.1186/s12859-016-0959-z
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.
  • Mar 3, 2016
  • BMC Bioinformatics
  • Binghuang Cai + 1 more

BackgroundUbiquitination is a very important process in protein post-translational modification, which has been widely investigated by biology scientists and researchers. Different experimental and computational methods have been developed to identify the ubiquitination sites in protein sequences. This paper aims at exploring computational machine learning methods for the prediction of ubiquitination sites using the physicochemical properties (PCPs) of amino acids in the protein sequences.ResultsWe first establish six different ubiquitination data sets, whose records contain both ubiquitination sites and non-ubiquitination sites in variant numbers of protein sequence segments. In particular, to establish such data sets, protein sequence segments are extracted from the original protein sequences used in four published papers on ubiquitination, while 531 PCP features of each extracted protein sequence segment are calculated based on PCP values from AAindex (Amino Acid index database) by averaging PCP values of all amino acids on each segment. Various computational machine-learning methods, including four Bayesian network methods (i.e., Naïve Bayes (NB), Feature Selection NB (FSNB), Model Averaged NB (MANB), and Efficient Bayesian Multivariate Classifier (EBMC)) and three regression methods (i.e., Support Vector Machine (SVM), Logistic Regression (LR), and Least Absolute Shrinkage and Selection Operator (LASSO)), are then applied to the six established segment-PCP data sets. Five-fold cross-validation and the Area Under Receiver Operating Characteristic Curve (AUROC) are employed to evaluate the ubiquitination prediction performance of each method. Results demonstrate that the PCP data of protein sequences contain information that could be mined by machine learning methods for ubiquitination site prediction. The comparative results show that EBMC, SVM and LR perform better than other methods, and EBMC is the only method that can get AUCs greater than or equal to 0.6 for the six established data sets. Results also show EBMC tends to perform better for larger data.ConclusionsMachine learning methods have been employed for the ubiquitination site prediction based on physicochemical properties of amino acids on protein sequences. Results demonstrate the effectiveness of using machine learning methodology to mine information from PCP data concerning protein sequences, as well as the superiority of EBMC, SVM and LR (especially EBMC) for the ubiquitination prediction compared to other methods.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0959-z) contains supplementary material, which is available to authorized users.

  • Research Article
  • Cite Count Icon 146
  • 10.1091/mbc.10.9.3015
A role for the GSG domain in localizing Sam68 to novel nuclear structures in cancer cell lines.
  • Sep 1, 1999
  • Molecular Biology of the Cell
  • Taiping Chen + 3 more

The GSG (GRP33, Sam68, GLD-1) domain is a protein module found in an expanding family of RNA-binding proteins. The numerous missense mutations identified genetically in the GSG domain support its physiological role. Although the exact function of the GSG domain is not known, it has been shown to be required for RNA binding and oligomerization. Here it is shown that the Sam68 GSG domain plays a role in protein localization. We show that Sam68 concentrates into novel nuclear structures that are predominantly found in transformed cells. These Sam68 nuclear bodies (SNBs) are distinct from coiled bodies, gems, and promyelocytic nuclear bodies. Electron microscopic studies show that SNBs are distinct structures that are enriched in phosphorus and nitrogen, indicating the presence of nucleic acids. A GFP-Sam68 fusion protein had a similar localization as endogenous Sam68 in HeLa cells, diffusely nuclear with two to five SNBs. Two other GSG proteins, the Sam68-like mammalian proteins SLM-1 and SLM-2, colocalized with endogenous Sam68 in SNBs. Different GSG domain missense mutations were investigated for Sam68 protein localization. Six separate classes of cellular patterns were obtained, including exclusive SNB localization and association with microtubules. These findings demonstrate that the GSG domain is involved in protein localization and define a new compartment for Sam68, SLM-1, and SLM-2 in cancer cell lines.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.1074/jbc.274.50.35749
Functional Analysis of a Mutation Occurring between the Two In-frame AUG Codons of Human Angiotensinogen
  • Dec 1, 1999
  • Journal of Biological Chemistry
  • Toshiaki Nakajima + 6 more

Angiotensinogen (ANG) is the specific substrate of the renin-angiotensin system, a major participant in blood pressure control. We have identified a natural mutation at the -30 amino acid position of the angiotensinogen signal peptide, in which an arginine is replaced by a proline (R-30P). Heterozygous individuals with R-30P showed a tendency to lowered plasma angiotensinogen level (1563 ng of ANG I/ml (range 1129-1941)) compared with normal individuals in the family (1892 ng of ANG I/ml (range 1603-2072)). Human angiotensinogen mRNA has two in-phase translation initiation codons (AUG) starting upstream 39 and 66 nucleotides from the cap site. R-30P occurs in a cluster of basic residues adjacent to the first AUG codon that may affect intracellular sorting of the nascent protein. Pulse-chase experiments in transiently transfected cultured cells revealed that the R-30P mutation was associated with reduced amounts of both intra- and extracellular protein. In a cell-free system, we found that two forms of native angiotensinogen were generated by alternative initiation of translation at either AUG codon. Alteration of either the first or second AUG codons abolished the synthesis of the longer and the shorter form of native angiotensinogen, respectively. Furthermore, the rate of secretion of the shorter form was lower than that of the longer form. By transplanting angiotensinogen signal peptide onto green fluorescence protein, however, we found that both forms of the signal peptide could target green fluorescence protein, normally localized in the cytoplasm, to the secretory pathway. Although the R-30P mutation may not affect intracellular sorting of angiotensinogen in a qualitative manner, it leads to a quantitative reduction in the net secretion of mature angiotensinogen through decreased translocation or increased residence time in the endoplasmic reticulum.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 19
  • 10.7717/peerj.3261
Prediction of post-translational modification sites using multiple kernel support vector machine.
  • Apr 27, 2017
  • PeerJ
  • Binghua Wang + 2 more

Protein post-translational modification (PTM) is an important mechanism that is involved in the regulation of protein function. Considering the high-cost and labor-intensive of experimental identification, many computational prediction methods are currently available for the prediction of PTM sites by using protein local sequence information in the context of conserved motif. Here we proposed a novel computational method by using the combination of multiple kernel support vector machines (SVM) for predicting PTM sites including phosphorylation, O-linked glycosylation, acetylation, sulfation and nitration. To largely make use of local sequence information and site-modification relationships, we developed a local sequence kernel and Gaussian interaction profile kernel, respectively. Multiple kernels were further combined to train SVM for efficiently leveraging kernel information to boost predictive performance. We compared the proposed method with existing PTM prediction methods. The experimental results revealed that the proposed method performed comparable or better performance than the existing prediction methods, suggesting the feasibility of the developed kernels and the usefulness of the proposed method in PTM sites prediction.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant