Among single-component inversions, vertical gravity gradient data have been proven to be sufficient to obtain reasonable density distribution. However, the increasing numbers of observations and model parameters inevitably produce a large-scale sensitivity matrix, causing problems such as large memory occupation and slow computations in the inversion. Aiming at solving these long-standing problems, we have introduced a matrix compression technique to reduce memory occupation and constructed a hybrid parallel algorithm for accelerating focusing inversion. We develop the source codes and the design pattern for the proposed algorithm using C language with open multiprocessing and compute unified device architecture for further improving algorithm feasibility and reusability. A graphical user interface and visualization tools also are developed using Python. In the tests with synthetic data produced by numerical simulation and real data from Vinton Dome, it is verified that the inversion can produce an accurate result of density values and object location. The performance analysis is conducted using different scales of data. The speedup increases with the increasing number of graphics processing units (GPUs) and model parameters, proving that the proposed parallel algorithm is scalable. We compare this algorithm with others without matrix compression to demonstrate how the compressed sensitivity matrix storing and accessing method influences inversion performance. More specifically, for an inversion with 100 × 100 × 20 cells, the average speed is approximately 6 s per iteration, with a maximum speedup of 36.94. Our algorithm has a short running time and high multiple-GPU parallel efficiency. Hence, the hybrid parallel algorithm is more powerful at implementing large-scale data calculations and could be used for solving computing problems in mass data inversions.