This paper investigates a parallel machine scheduling problem with uncertain job processing time, where the job tardiness and optional machines are considered. To address the factor of energy saving, only a subset of all available machines are turned on, which is referred to as not-all-machine (NAM). To depict the uncertain processing time, a mean–mean absolute deviation (MAD) ambiguity set is utilized, and the cost of job tardiness is minimized under the worst-case distribution scenario over the ambiguity set. After building a distributionally robust optimization (DRO) model, theoretical bounds of the optimal number of machines are obtained. Since the model is not computationally scalable, an upper bound on its inner minimization problem is employed, and a mixed integer linear programming (MILP) approximation is obtained based on McCormick inequalities. For the DRO model, tailored speedup techniques are employed, significantly enhancing the computational performance. To evaluate the validity of the proposed DRO model, we compare it with its stochastic programming (SP) counterpart under various parameter settings. Numerical experiments demonstrate that the DRO model exhibits strong performance in the worst-case scenarios. As the problem size increases, the DRO model casts clear advantages over the SP model in terms of computational efficiency and reliability. It is observed that the performance of the DRO model is more stable than that of the nominal sequence, especially with loose due dates. Furthermore, the out-of-sample performance under various decision making preferences shed new lights into the trade-off between energy saving and production efficiency.