How does changing MCCE's parameters change accuracy? (under construction)
MCCE has many parameters available to choose from, including whether waters are retained, the assumed dielectric constant, what solver is used on the Poisson-Bolztman equation, etc. How do these choices impact the accuracy of pKa calculations? Here, we present findings about MCCE's accuracy, obtained from rigorous testing of these various parameters.
We used a set of 36 PDB files sourced from RCSB.org, listed HERE. These PDB files were chosen for their varying sizes, as well as the amount of experimental data available for their residue pKa values. You can find the sources for the experimental pKas in this file: pkdbv1_WT_pkas.csv (change this later to be a citation probably)
Accuracy Criteria
Our 36 PDB files contain 1,587 residues with calculated pKa values. We have experimentally verified "true" pKa values for 425 of these residues. To measure the accuracy of MCCE's pKa calculations, we use an in-house program that takes a finished batch of these 36 PDBs, and compares the MCCE-projected pKa with the "true" pKa value for each of the 425 residues. Our metric to judge a successful MCCE run, is what percentage of projected pKa's are within +/-1 of the "true" pKa.
For example, this graph depicts the experimental pKas on the X axis, and MCCE's projected pKas on the Y axis. This run also includes extra values- after the initial pKa calculations are done, all pKas can be moved up or down according to their residue type, by the extra values, to reduce bias.bias (of course, changing the extra values of a residue will not fix the variance of the residue group relative to its experimental values). For comparison's sake, here is another graph of the same calculation,dataset, without the extra values:
Poisson-Boltzmann Solvers
Step 3 of MCCE involves solving the Poisson-Boltzmann equation. MCCE4 is currently capable of switching between three different PB solvers: ZAP, Delphi, both sourced from OpenEye, and NGPB, created in house by partners in conjunction with Gunner Lab.
(include RMSD w and w/o outliers)
Dielectric Constant
The variance of MCCE's output is very noticeable for different assumed dielectric constants D. Below, we look at the optimized versions of two NGPB runs, the left assuming D = 4, and the right assuming D = 8. Both runs are dry, and use unscaled VDWs in their calculations.
We see similar shapes for the different residue groups in both images, but the of both are
Wet/Dry Runs
We observed little change in accuracy resulting from the presence of waters in a PDB file.