Proteins are prone to aggregate when expressed above their solubility limits. Aggregation may occur rapidly, potentially as early as proteins emerge from the ribosome, or slowly, following synthesis. However, in vivo data on aggregation rates are scarce. Here, we classified the Escherichia coli proteome into rapidly and slowly aggregating proteins using an in vivo image-based screen coupled with machine learning. We find that the majority (70%) of cytosolic proteins that become insoluble upon overexpression have relatively low rates of aggregation and are unlikely to aggregate co-translationally. Remarkably, such proteins exhibit higher folding rates compared to rapidly aggregating proteins, potentially implying that they aggregate after reaching their folded states. Furthermore, we find that a substantial fraction (~ 35%) of the proteome remain soluble at concentrations much higher than those found naturally, indicating a large margin of safety to tolerate gene expression changes. We show that high disorder content and low surface stickiness are major determinants of high solubility and are favored in abundant bacterial proteins. Overall, our study provides a global view of aggregation rates and hence solubility limits of proteins in a bacterial cell.
Read full abstract