Abstract
Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP–CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP–CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP–CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP–CRF will facilitate the development of protein sequence analysis.
Highlights
Disordered proteins/regions (IDPs/IDRs) refer to the proteins/regions without a stable three-dimensional structure in their native state [1]
Three classification-based predictors are constructed as well, which are based on support vector machine (SVM), artificial neural network (ANN) and random forest (RF) models
The reason is that IDP–conditional random fields (CRFs) can capture the interdependency between labels of sequence-adjacent residues, and the global sequence patterns of disordered regions can be incorporated into IDP–CRF
Summary
Disordered proteins/regions (IDPs/IDRs) refer to the proteins/regions without a stable three-dimensional structure in their native state [1]. IDPs/IDRs are widely distributed in nature, and are correlated with many biological functions [2,3] and a broad range of human diseases, such as genetic diseases [4], cancer [3] and neurodegenerative diseases [5,6]. Accurately identifying IDPs/IDRs is crucial for understanding the mechanism of biological functions and exploring the relationship between IDPs/IDRs and diseases. DisProt [2] archives experimentally certified IDPs/IDRs by different techniques, such as X-ray crystallography, nuclear magnetic resonance (NMR) and circular dichroism (CD) spectroscopy. Identifying IDPs/IDRs by using experimental methods is time consuming and expensive.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.