This communication investigates high-performance computation schemes for local Lagrange interpolation and anterpolation operations in the parallel graphics processing unit (GPU)-accelerated distributed-memory multilevel fast multipole algorithm (MLFMA). Two ELLPACK format-based schemes, namely, block ELLPACK (ELL-B) and hybrid compressed sparse column (CSC)-block ELLPACK (CSC-ELL-B), are proposed for the evaluation of interpolation and anterpolation operations, respectively, which ensure high computational throughput for GPU calculation. Optimization using the GPU hierarchical memory architecture, the mechanism of the stream and the CPU/GPU asynchronous computation pattern are employed to further improve the overall performance. The proposed schemes are proven to be an order of magnitude faster than the conventional schemes for aggregation/disaggregation operations. For an aircraft model involving over 10 billion unknowns, the iteration time is reduced by over half, which is remarkable progress in the development of GPU-accelerated parallelization of MLFMA.