High-order, high-dimension, and large-scale sparse tensors (HHLST) have found their origin in various real industrial applications, such as social networks, recommender systems, bioinformatics, and traffic information. To handle these complex tensors, sparse tensor decomposition techniques are employed to project the HHLST into a low-rank space. In this paper, we propose a novel sparse tensor decomposition model called Sparse FastTucker Decomposition (SFTD), which is a variant of Sparse Tucker Decomposition (STD). The SFTD utilizes Kruskal approximation for the core tensor, and we present a theorem that reduces the exponential space and computational overhead to a polynomial one. Additionally, we reduce the space overhead of intermediate parameters in the algorithmic process by sampling the intermediate matrix. Furthermore, this method guarantees convergence. To enhance the speed of SFTD, we leverage the compactness of matrix multiplication and parallel access through a stochastic strategy, resulting in GPU-accelerated cuFastTucker. Moreover, we propose a data division and communication strategy for cuFastTucker to accommodate data on Multi-GPU setups. Our proposed cuFastTucker demonstrates faster calculation and convergence speeds, as well as significantly lower space and computational overhead compared to state-of-the-art (SOTA) algorithms such as P-Tucker, Vest, GTA, Bigtensor and SGD_Tucker.