Abstract

BackgroundProtein aggregation is a significant problem in the biopharmaceutical industry (protein drug stability) and is associated medically with over 40 human diseases. Although a number of computational models have been developed for predicting aggregation propensity and identifying aggregation-prone regions in proteins, little systematic research has been done to determine physicochemical properties relevant to aggregation and their relative importance to this important process. Such studies may result in not only accurately predicting peptide aggregation propensities and identifying aggregation prone regions in proteins, but also aid in discovering additional underlying mechanisms governing this process.ResultsWe use two feature selection algorithms to identify 16 features, out of a total of 560 physicochemical properties, presumably important to protein aggregation. Two predictors (ProA-SVM and ProA-RF) using selected features are built for predicting peptide aggregation propensity and identifying aggregation prone regions in proteins. Both methods are compared favourably to other state-of-the-art algorithms in cross validation. The identified important properties are fairly consistent with previous studies and bring some new insights into protein and peptide aggregation. One interesting new finding is that aggregation prone peptide sequences have similar properties to signal peptide and signal anchor sequences.ConclusionsBoth predictors are implemented in a freely available web application (http://www.abl.ku.edu/ProA/). We suggest that the quaternary structure of protein aggregates, especially soluble oligomers, may allow the formation of new molecular recognition signals that guide aggregate targeting to specific cellular sites.

Highlights

  • Protein aggregation is a significant problem in the biopharmaceutical industry and is associated medically with over 40 human diseases

  • Feature selection We use two feature selection methods, namely SVMRFE and Random Forest (RF)-IS, to select features which are important to protein aggregation

  • The feature selection procedure of both approaches starts with the full set of features and iteratively eliminates a number or a fraction of the least important features, as determined by the support vector machine (SVM)-RFE and RF-IS algorithms

Read more

Summary

Introduction

Protein aggregation is a significant problem in the biopharmaceutical industry (protein drug stability) and is associated medically with over 40 human diseases. Protein aggregation has been intensely studied experimentally and computationally because the aggregation of protein drugs is of significant concern It is encountered routinely during the protein refolding, purification, formulation, storage and shipping processes [1,2]. Despite extensive research from protein drugs, as the former are well ordered entities containing cross beta structure fibers while the later are frequently amorphous entities, current prevailing theories consider both amyloid fibers and amorphous aggregates are formed from partially-folded intermediates [12]. Both amorphous aggregates and fibers may contain similar aggregation prone motifs [13]

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.