We report on the design, implementation, optimization, and performance of the CADISHI software package, which calculates histograms of pair-distances of ensembles of particles on CPUs and GPUs. These histograms represent 2-point spatial correlation functions and are routinely calculated from simulations of soft and condensed matter, where they are referred to as radial distribution functions, and in the analysis of the spatial distributions of galaxies and galaxy clusters. Although conceptually simple, the calculation of radial distribution functions via distance binning requires the evaluation of O(N2) particle-pair distances where N is the number of particles under consideration. CADISHI provides fast parallel implementations of the distance histogram algorithm for the CPU and the GPU, written in templated C++ and CUDA. Orthorhombic and general triclinic periodic boxes are supported, in addition to the non-periodic case. The CPU kernels feature cache blocking, vectorization and thread-parallelization to obtain high performance. The GPU kernels are tuned to exploit the memory and processor features of current GPUs, demonstrating histogramming rates of up to a factor 40 higher than on a high-end multi-core CPU. To enable high-throughput analyses of molecular dynamics trajectories, the compute kernels are driven by the Python-based CADISHI engine. It implements a producer–consumer data processing pattern and thereby enables the complete utilization of all the CPU and GPU resources available on a specific computer, independent of special libraries such as MPI, covering commodity systems up to high-end high-performance computing nodes. Data input and output are performed efficiently via HDF5. In addition, our CPU and GPU kernels can be compiled into a standard C library and used with any application, independent from the CADISHI engine or Python. The CADISHI software is freely available under the MIT license. Program summaryProgram Title: CADISHIProgram Files doi:http://dx.doi.org/10.17632/82b8sdft79.1Licensing provisions: MITProgramming language: C++, CUDA, PythonNature of problem: Radial distribution functions are of fundamental importance in soft and condensed matter physics and astrophysics. However, the calculation of distance histograms scales quadratically with the particle number. To be able to analyze large data sets, fast and efficient implementations of distance histogramming are crucial.Solution method: CADISHI provides parallel, highly optimized implementations of distance histogramming. On the CPU, high performance is achieved via an advanced cache blocking scheme in combination with vectorization and threading. On the GPU, the problem is decomposed via a tiling scheme to exploit the GPU’s massively parallel architecture and hierarchy of global, constant, and shared memory efficiently, resulting in significant speedups compared to the CPU. Moreover, CADISHI exploits all the resources (GPUs, CPUs) available on a compute node in parallel.Additional comments including restrictions and unusual features: Additionally to the non-periodic case CADISHI implements the minimum image convention for orthorhombic and general triclinic periodic boxes. We provide Python interfaces and the option to compile the kernels into a plain C library.