This article presents a new fast, highly scalable distributed matrix multiplication algorithm on Apache Spark, called <i>Stark</i> , based on Strassen’s matrix multiplication algorithm. Stark preserves Strassen’s seven multiplications scheme in a distributed environment and thus achieves asymptotically faster execution time. It creates a distributed recursion tree of computation where each level of the tree corresponds to division and combination of distributed matrix blocks stored in the form of Resilient Distributed Datasets (RDDs). It processes each divide and combine step in parallel and memorises the sub-matrices by intelligently tagging matrix blocks in it. To the best of our knowledge, Stark is the first implementation of a distribute Strassen’s algorithm on Spark platform. We also report a detailed complexity analysis for the proposed algorithm, taking into account computation and communication costs. Experimental results suggest that Stark outperforms existing distributed matrix multiplication implementations on Spark – <i>Marlin</i> and <i>MLLib</i> , for high matrix sizes ( <inline-formula><tex-math notation="LaTeX">$\geq 16384\times 16384$</tex-math></inline-formula> ). Our experiments reveal optimal block sizes for each matrix size, which is also shown from theoretical analysis. We also show that the experimental and theoretical running times for Stark match closely. It has also been shown experimentally that Stark exhibits strong scalability with increasing number of executors.