Discovering nuclear targeting signal sequence through protein language learning and multivariate analysis

Yun Guo,Yang Yang,Yan Huang,Hong-Bin Shen

doi:10.1016/j.ab.2019.113565

Abstract

Nuclear localization signals (NLSs) are peptides that target proteins to the nucleus by binding to carrier proteins in the cytoplasm that transport their cargo across the nuclear membrane. Accurate identification of NLSs can help elucidate the functions of nuclear protein complexes. The currently known NLS predictors are usually specific to certain species or largely dependent on prior knowledge of NLS basic residues. Thus, a more general predictor is highly desired to reduce the potentially high false positives or false negatives in discovering new NLSs. Here, we report a new method, INSP (Identification Nucleus Signal Peptide), to effectively identify NLS mainly based on statistical knowledge and machine learning algorithms. In our NLS machine learning model, we considered the query protein sequence as text and extracted the sequence context features using a natural language model. These word-vector features encode discriminative knowledge of NLS motif frequency and are thus useful for model recognition. The output of the machine learning model will be fused with statistical knowledge of the query sequence to build a final multivariate regression model for NLS peptide identification. The experimental results demonstrate a promising performance of the new INSP approach. INSP is freely available at: www.csbio.sjtu.edu.cn/bioinf/INSP/for academic use.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Discovering nuclear targeting signal sequence through protein language learning and multivariate analysis

Abstract

Talk to us

Similar Papers

More From: Analytical Biochemistry

Lead the way for us

Journal: Analytical Biochemistry	Publication Date: Dec 26, 2019
Citations: 29

Similar Papers

Comparative Analysis of Radar-Cross-Section- Based UAV Recognition Techniques
Martins Ezuma ... Ismail Guvenc
IEEE Sensors Journal | VOL. 22
Martins Ezuma, et. al.Martins Ezuma ... Ismail Guvenc
15 Sep 2022
IEEE Sensors Journal | VOL. 22

A comparative study of antihypertensive drugs prediction models for the elderly based on machine learning algorithms.
Tiantian Wang ... Juntao Tan
Frontiers in Cardiovascular Medicine | VOL. 9
Tiantian Wang, et. al.Tiantian Wang ... Juntao Tan
01 Dec 2022
Frontiers in Cardiovascular Medicine | VOL. 9

Identification and Functional Characterization of Cytoplasmic Determinants of Plasmid DNA Nuclear Import
Felix M Munkonge ...
Journal of Biological Chemistry | VOL. 284
Felix M Munkonge, et. al.Felix M Munkonge ...
01 Sep 2009
Journal of Biological Chemistry | VOL. 284

Probabilistic DEAR models
Yanhong Cui ... Renkuan Guo
International Journal of Machine Learning and Cybernetics | VOL. 4
Yanhong Cui, et. al.Yanhong Cui ... Renkuan Guo
21 Jun 2012
International Journal of Machine Learning and Cybernetics | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discovering nuclear targeting signal sequence through protein language learning and multivariate analysis

Abstract

Talk to us

Similar Papers

More From: Analytical Biochemistry