Abstract With recent advancements in artificial intelligence technology, various studies are being conducted in the shipbuilding industry. Traditionally, hull form variation methods have relied on the intuition and expertise of designers, leading to inconsistent results and unintended changes in the ship's main dimensions depending on the designer's competence. Moreover, the iterative process of design variation and analysis to derive the optimal hull form is both costly and time-consuming. To address these issues, this study proposes an optimal hull design technique utilizing reinforcement learning, a type of unsupervised learning in machine learning. Reinforcement learning allows the model to learn from past policies by recording and accumulating the rewards associated with various actions taken by an agent in a specific environment. In this study, after calculating the main parameters of the ship, the agent defines a state representing hull information and performs local transformations of the bow and stern. The reward of reinforcement learning is defined as the change in total resistance due to the hull deformation, constrained by limiting the tolerance of the ship's prismatic coefficient (CP) and longitudinal center of buoyancy (LCB). In this study, the problem is solved by comparing the proximal policy optimization (PPO) algorithm and the deep deterministic policy gradient (DDPG) algorithm to find the best deep reinforcement learning (DRL) model for the hull optimization problem. The results were compared with the genetic algorithm and speed-constrained multi-objective particle swarm optimization (SMPSO), and the optimal hull resistance values were less different, but the time of the reinforcement learning model was five times shorter.