In this article, a new deep Reinforcement Learning (RL) model is proposed to solve the Single Container Loading Problem (SCLP) as well as the SCLP with full support. For that purpose, a multilayer neural network architecture is used. Computational experiments, conducted on benchmark instances with homogeneous boxes, revealed that the proposed model yields fairly good results compared to those found with state-of-the-art heuristics. Experiments have also shown good generalization capability of the proposed deep RL model to deal with both homogeneous and heterogeneous classes of instances. Nevertheless, this learning-based optimization approach still has an optimality gap compared to the well-designed heuristics of the operations research literature. Finally, the benefit of training the model under different levels of variability has been analysed. Results revealed that, for better performance, and if a company does not face high demand volatility, it is recommended to train the model under a low level of variability.