We present a parallel implementation of a direct solver for Poisson’s equation on extreme-scale supercomputers with accelerators. We introduce a chunked-pencil decomposition as the domain-decomposition strategy to distribute work among processing elements to achieve improved scalability at high counts of accelerators. Chunked-pencil decomposition enables overlapping MPI communication and data transfer between the central processing units (CPUs) and the graphics processing units (GPUs). It enables contiguous message transfer among the nodes and improves data locality by keeping neighboring elements in adjacent memory locations while permitting the use of shared memory for certain segments of the algorithm when possible. We study two different communication patterns within the chunked-pencil decomposition. The first pattern fully overlaps the communication with data transfer and aims to speedup the overall turnaround time. The second pattern concentrates on low memory usage and is more network friendly than the first pattern for computations at extreme scale. In our parallel implementation, we interleave OpenACC with MPI to support computations on the GPU or the CPU. The numerical solution and its formal second order of accuracy is verified using the method of manufactured solutions for various combinations of boundary conditions. Additionally, we used PittPack within an incompressible flow solver to further validate its accuracy and as well as demonstrate its versatility as a software package. We performed weak scaling analysis with up to 1.1 trillion Cartesian mesh points distributed over 16384 GPUs on a petascale leadership class supercomputer. Program SummaryProgram Title: PittPackProgram Files doi:http://dx.doi.org/10.17632/59ktwdby4r.1Licensing provisions: GNU General Public License 3Programming language: C++Nature of problem: An elliptic Poisson’s equation for pressure arises in the numerical solution of incompressible fluid flow. This equation needs to be solved accurately to enforce conservation of mass principle.Solution method: The Poisson’s equation is discretized with a second order accurate finite difference scheme. The resulting system of linear equations is solved with a direct method that employs fast Fourier transforms in the horizontal plane and a tridiagonal matrix solver in the third direction.Additional comments including restrictions and unusual features: The mesh has to be directionally uniform. Due to adoption of FFT in the horizontal plane, the boundary conditions in each direction must be of the same type. Each direction in the horizontal plane has to be partitioned by an equal integer number such that n2 parallel processes are deployed at run-time.