This paper develops and optimizes a non-orthogonal and noncoherent multi-user massive single-input multiple-output (SIMO) framework, with the objective of enabling scalable ultra-reliable low-latency communications (sURLLC) in Beyond-5G (B5G)/6G wireless communication systems. In this framework, the huge diversity gain associated with the large-scale antenna array in the massive SIMO system is leveraged to ensure ultra-high reliability. To reduce the overhead and latency induced by the channel estimation process, we advocate for the noncoherent communication technique, which does not need the knowledge of instantaneous channel state information (CSI) but only relies on large-scale fading coefficients for message decoding. To boost the scalability of noncoherent massive SIMO systems, we enable the non-orthogonal channel access of multiple users by devising a new differential modulation scheme to ensure that each transmitted signal matrix can be uniquely determined in the noise-free case and be reliably estimated in noisy cases when the antenna array size is scaled up. The key idea is to make the transmitted signals from multiple geographically separated users be superimposed properly over the air, such that when the sum signal is correctly detected, the signal sent by each individual user can be uniquely determined. To further enhance the average error performance when the array antenna number is large, we propose a max-min Kullback-Leibler (KL) divergence-based design by jointly optimizing the transmitted powers of all users and the sub-constellation assignments among them. The simulation results show that the proposed design significantly outperforms the existing max-min Euclidean distance-based counterpart in terms of error performance. Moreover, our proposed approach also has a better error performance compared to the conventional coherent zero-forcing (ZF) receiver with orthogonal channel training, particularly for cell-edge users.