Massive ultra-reliable and low-latency communications (mURLLC) is an emerging and dominate traffic service in 6G. To reduce the signaling overhead and access delay, grant-free random access (GFRA) is widely used in mURLLC. As the first step in GFRA, active user detection (AUD) is aimed to identify the set of active users accurately and timely. Conventional AUD schemes relying on iterative computations over massive users bring redundant computing overload and processing delay, which seriously affect the system scalability in the mURLLC scenario. Considering the near-real-time requirement of mURLLC, we propose a scalable deep learning-based AUD approach utilizing similar channel sparsity in cell-free (CF) massive multiple input multiple-output (mMIMO) systems. By exploiting the distributed computing unit, i.e., space expansion unit (SEU), we design a SEU-assisted CF mMIMO to improve the scalability of the traditional centralized CF computing architecture. In the proposed system, all access points (APs) are divided into several clusters, and the SEU in each cluster provides a reliable distributed AUD scheme through a one-dimensional convolutional network (1D CNN). In addition, a transfer learning-based ensemble model is established at the CPU to achieve a better global detection decision. Simulation results demonstrate the superiority of our scalable deep learning-based approach, and reveal that through the transfer learning-based model fusion at the CPU, our proposed scalable SEU-assisted approach can obtain success probability close to that of the centralized CF computing scheme with less access delay. In addition, our scheme requires fewer pilots than other CS-based schemes.