Abstract

BackgroundAlgorithms designed to predict protein disorder play an important role in structural and functional genomics, as disordered regions have been reported to participate in important cellular processes. Consequently, several methods with different underlying principles for disorder prediction have been independently developed by various groups. For assessing their usability in automated workflows, we are interested in identifying parameter settings and threshold selections, under which the performance of these predictors becomes directly comparable.ResultsFirst, we derived a new benchmark set that accounts for different flavours of disorder complemented with a similar amount of order annotation derived for the same protein set. We show that, using the recommended default parameters, the programs tested are producing a wide range of predictions at different levels of specificity and sensitivity. We identify settings, in which the different predictors have the same false positive rate. We assess conditions when sets of predictors can be run together to derive consensus or complementary predictions. This is useful in the framework of proteome-wide applications where high specificity is required such as in our in-house sequence analysis pipeline and the ANNIE webserver.ConclusionsThis work identifies parameter settings and thresholds for a selection of disorder predictors to produce comparable results at a desired level of specificity over a newly derived benchmark dataset that accounts equally for ordered and disordered regions of different lengths.

Highlights

  • Algorithms designed to predict protein disorder play an important role in structural and functional genomics, as disordered regions have been reported to participate in important cellular processes

  • Definition of disorder Over the last decades, the field of structural biology has gained awareness of the importance of disordered regions or even fully unstructured proteins that participate in biological processes [1,2,3], culminating in a boom of protein disorder predictor development during the last few years [4]

  • The most complete database of disordered protein segments is provided by DisProt [19,20], the release 4.5 of which was available at the start of this work

Read more

Summary

Introduction

Algorithms designed to predict protein disorder play an important role in structural and functional genomics, as disordered regions have been reported to participate in important cellular processes. Several methods with different underlying principles for disorder prediction have been independently developed by various groups. There is a diverse nomenclature to express similar observations of disorder, such as intrinsically disordered proteins (IDPs), known as natively disordered, natively unfolded or intrinsically unstructured proteins (IUPs) [5], just to name a few. Whether these terms are used to describe full-length sequences is another issue, as frequently, due to technical limitations, structural evidence is available only for individual domains. Some of these regions may participate in processes where transitions between different conformational states occur, as described in the trinity [8] or quartet models [9]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.