While iterative reconstruction algorithms for tomography have several advantages compared to standard backprojection methods, the adoption of such algorithms in large-scale imaging facilities is still limited, one of the key obstacles being their high computational load. Although GPU-enabled computing clusters are, in principle, powerful enough to carry out iterative reconstructions on large datasets in reasonable time, creating efficient distributed algorithms has so far remained a complex task, requiring low-level programming to deal with memory management and network communication. The ASTRA toolbox is a software toolbox that enables rapid development of GPU accelerated tomography algorithms. It contains GPU implementations of forward and backprojection operations for many scanning geometries, as well as a set of algorithms for iterative reconstruction. These algorithms are currently limited to using GPUs in a single workstation. In this paper, we present an extension of the ASTRA toolbox and its Python interface with implementations of forward projection, backprojection and the SIRT algorithm that can be distributed over multiple GPUs and multiple workstations, as well as the tools to write distributed versions of custom reconstruction algorithms, to make processing larger datasets with ASTRA feasible. As a result, algorithms that are implemented in a high-level conceptual script can run seamlessly on GPU-enabled computing clusters, up to 32 GPUs or more. Our approach is not limited to slice-based reconstruction, facilitating a direct portability of algorithms coded for parallel-beam synchrotron tomography to cone-beam laboratory tomography setups without making changes to the reconstruction algorithm.