Optimal Non-orthogonal multiple access (NOMA) design with practical finite-alphabet inputs instead of ideal Gaussian inputs has been addressed in this paper. With the aid of tight mutual information neural estimation, a capacity-driven end-to-end learning communication framework is proposed to acquire optimal superposition coding and thus approach the capacity region of NOMA with finite-alphabet inputs. For such a purpose, we firstly give a closed-form expression of conditional mutual information to define the capacity region of NOMA with finite-alphabet inputs. A natural tight bound on mutual information is then derived for simplifying the closed-form expression, which is needed to establish a consistent and tight mutual information neural estimator. Lastly, in the end-to-end learning framework, an encoder and a self-attention, which are respectively responsible for constellation learning and power allocation, are guided by the capacity approaching method based on mutual information estimation, while successive interference cancellation (SIC) is designed as a neural unit embedded in the decoder to guarantee decoding performance. Numerical results show that the proposed capacity-driven end-to-end learning scheme achieves the achievable rate region for NOMA, with SER performance comparable to state-of-the-art decoding schemes.