Deep reinforcement learning (DRL) technology has been actively studied with the recent advances in deep learning. As a result, the researchers are continuously improving performance and expanding the applications. However, recent literature reports that the performance of DRL is sensitive to the various design choices, e.g., the neural network initialization. Accordingly, it makes DRL hard to obtain a stable performance, which degrades reproducibility. Therefore, we propose a supervised pre-training method for both policy and value networks to improve stability. We pre-train the policy network to maximize the initial entropy and pre-train the value network to bias the distribution to a specific value. The experiments are conducted on tasks with discrete action space where it is hard to control the initial entropy. Through the experiments, the effectiveness of the proposed method in terms of stability and performance is validated.