Computational free energy-based methods have the potential to significantly improve throughput and decrease costs of protein design efforts. Such methods must reach a high level of reliability, accuracy, and automation to be effectively deployed in practical industrial settings in a way that impacts protein design projects. Here, we present a benchmark study for the calculation of relative changes in protein–protein binding affinity for single point mutations across a variety of systems from the literature, using free energy perturbation (FEP+) calculations. We describe a method for robust treatment of alternate protonation states for titratable amino acids, which yields improved correlation with and reduced error compared to experimental binding free energies. Following careful analysis of the largest outlier cases in our dataset, we assess limitations of the default FEP+ protocols and introduce an automated script which identifies probable outlier cases that may require additional scrutiny and calculates an empirical correction for a subset of charge-related outliers. Through a series of three additional case study systems, we discuss how Protein FEP+ can be applied to real-world protein design projects, and suggest areas of further study.