Abstract

Protein solubility prediction is essential to understand diverse types of biological processes and to explore the impact of different factors (ionic strength, temperature, PH of medium and electrostatic repulsion) on the productivity of proteins. It also plays an important role in disease analysis and drug development processes. Protein solubility prediction through experimental approaches is time-consuming, labour intensive and error-prone. To empower the process of protein solubility prediction and facilitate large scale analysis, 16 different computational predictors have been proposed. However, these predictors have low predictive performance mainly due to extraction of less semantic and discriminative features from raw protein sequences. Existing predictors either extract sequence order information or positional information, while both types of information are important to discriminate soluble and insoluble proteins. This paper presents a novel encoder CTAPAAC capable of generating statistical representations of protein sequences by extracting 4 different types of information correlation, distribution, composition and transition. Over 4 benchmark datasets a comprehensive intrinsic and extrinsic performance analysis of proposed and 14 most widely used existing protein sequence encoders reveals that proposed encoder has more potential in transforming soluble and insoluble protein sequences into statistical vectors having discriminative patterns among soluble and insoluble classes. Proposed encoder along with random forest classifier outperforms existing best performing protein solubility predictors with a significant margin of 6%, 7%, 25% and 10% over PSI:Biology, E.coli, price and Esol datasets in terms of accuracy. Source code of proposed predictor is publicly available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Faiza-Mehmood/RPPSP</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call