In this paper, we present a parallel 2-approximation Steiner minimal tree algorithm and its MPI-based distributed implementation. In place of expensive distance computations between all pairs of seed vertices, the solution we employ exploits a cheaper Voronoi cell computation. Our design leverages asynchronous processing and message prioritization to accelerate convergence of distance computations, and harnesses vertex and edge centric processing to offer fast time-to-solution. We demonstrate scalability and performance using real-world graphs with up to 128 billion edges and 512 compute nodes, and show the ability to find Steiner trees with up to one million seed vertices. Using 12 data instances, we present comparison with the state-of-the-art exact solver, SCIP-Jack, and two sequential 2-approximate algorithms. We empirically show that, on average, the total distance of the Steiner tree identified by our solution is 1.1290 times greater than the Steiner minimal tree – well within the theoretical approximation bound of 2.