ABSTRACT A novel methodology for the evaluation of two electron integrals up to f functions using Graphics Processing Units (GPUs) is presented. The Head-Gordon-Pople recursion relationships are solved via a simple heuristic methodology to minimize the number of evaluated intermediates in the recursion trees. Automatic code generation is used to generate highly optimized CUDA kernels. A novel approach for f functions is presented in which integral classes are split into smaller subclasses to minimize register pressure and exploit additional parallelism at the cost of recomputing a small number of intermediates. Alongside optimized kernels, the ERI evaluation scheme works in conjunction with an efficient work distribution scheme which guarantees load-balancing during computation. The new HGP scheme shows excellent speedups of 2 × to above 60 × against existing GPU code. Additionally, when coupled with digestion into the Fock matrix, the scaling is excellent on up to 7 GPUs with an 85% parallel efficiency for the 6-31G(d) basis set.