Abstract

Amino acids as biomolecules are essential building blocks of proteins, which are fundamental to life and living organisms. The pKa values of amino acid residues in proteins can impact their structure and function. Furthermore, the pKa can be used to optimize experimental conditions for protein purification. In this research, a data set containing the pKa of the carboxylic acid group for 52 amino acids from various families in a water solution was used. The quantity structure property relationship (QSPR) approach was applied in our work for predicting the pKa using different machine learning methods. Initially, the amine molecules were drawn, and optimized and 3224 molecular descriptors were calculated for each molecule. Selecting among molecular descriptors using genetic algorithm-multi linear regression (GA-MLR) shows that the model with 8 descriptors is the best economic one. The comparison of results shows that the feed forward neural network (FFNN) is better than the particle swarm optimization-support vector machine (PSO-SVM) model. The coefficients of determination at different models follow the order of R2 FFNN= 0.9987 > R2 PSO-SVM= 0.9971 > R2 GA-MLR= 0.9951. The R2test for FFNN and PSO-SVM are near the R2train and this evidence proves the validation of models. Consequently, by using FFNN, the pKa of the carboxylic acid group in water solutions of amino acids can be predicted without any experimental costs with a low average absolute relative error (AARE%) of 2.82.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call