Intrinsically disordered regions are found in most eukaryotic proteins and are enriched in positively and negatively charged residues. While it is often convenient to assume these residues follow their model-compound pK a values, recent work has shown that local charge effects (charge regulation) can upshift or downshift sidechain pK a values with major consequences for molecular function. Despite this, charge regulation is rarely considered when investigating disordered regions. The number of potential charge microstates that can be populated through acid/base regulation of a given number of ionizable residues in a sequence, N , scales as ∼2 N . This exponential scaling makes the assessment of the full charge landscape of most proteins computationally intractable. To address this problem, we developed MEDOC (Multisite Extent of Deprotonation Originating from Context) to determine the degree of protonation of a protein based on the local sequence context of each ionizable residue. We show that we can drastically reduce the number of parameters necessary to determine the full, analytic, Boltzmann partition function of the charge landscape at both global and site-specific levels. Our algorithm applies the structure of the q -canonical ensemble, combined with novel strategies to rapidly obtain the minimal set of parameters, thereby circumventing the combinatorial explosion of the number of charge microstates even for proteins containing a large number of ionizable amino acids. We apply MEDOC to several sequences, including a global analysis of the distribution of pK a values across the entire DisProt database. Our results show differences in the distribution of predicted pK a values for different amino acids, in agreement with NMR-measured distributions in proteins.
Read full abstract