Abstract

It is widely accepted that features such as pI, length, molecular mass and amino acid (AA) sequence have a significant influence on protein solubility. Here, we mainly focused on AA composition and explored those that most affected the soluble expression level of human serum albumin (HSA) domain antibody (dAb). The soluble expression and sequence of 65 dAb variants were analysed using clustering and linear modelling. Certain AAs significantly affected the soluble expression level of dAb, with the specific AA combinations being (S, R, N, D, Q), (G, R, C, N, S) and (R, S, G); these combinations respectively affected the dAb expression level in the broth supernatant, the level in the pellet lysate and total soluble dAb. Among the 20 AAs, R displayed a negative influence on the soluble expression level, whereas G and S showed positive effects. A linear model was built to predict the soluble expression level from the sequence; this model had a prediction accuracy of 80%. In summary, increasing the content of polar AAs, especially G and S, and decreasing the content of R, was helpful to improve the soluble expression level of HSA dAb.

Highlights

  • Given the outstanding advantages of Escherichia coli, including fast growth, inexpensive culturing, high-density cultivation, and simple genetic manipulation, it has been suggested that E. coli should be the first host tried for expression of any protein [1]

  • amino acid (AA) composition significantly affects the soluble expression of domain antibody (dAb)

  • It is widely accepted that AA sequence is significantly correlated with protein production, which was shown in this study through analysis of the consistency of cluster results based on AA sequences and the corresponding soluble expression levels of dAbs (Table 4)

Read more

Summary

Introduction

Given the outstanding advantages of Escherichia coli, including fast growth, inexpensive culturing, high-density cultivation, and simple genetic manipulation, it has been suggested that E. coli should be the first host tried for expression of any protein [1]. Several strategies have been used to increase protein production and solubility, for example altering expression system elements [3,4] and optimizing culture conditions [5]. Several prediction models have been established [6,9], such as the Harrison prediction model [10], multiple linear regression (MLR) model [11], solubility index-based model [12], support vector machine-based model [13,14], PROSO model [15], SOLpro model [16], cc SOL model [17] and PROSO II model [18] These bioinformatics models can significantly reduce trial and error procedures involved in optimization of expression systems to increase the soluble expression level of heterologous proteins. There has been limited application of these prediction models, partly because of the significant differences among the proteins chosen for building them and because of the adoption of inconsistent culture conditions for expression of proteins [6,8,9]

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.