A central challenge in generative design is the exploration of vast number of solutions. In this work, we extend two major density-based structural topology optimization (STO) methods based on four classes of exploration algorithms of reinforcement learning (RL) to STO problems, which approaches generative design in a new way. The four methods are: first, using ε -greedy policy to disturb the search direction; second, using upper confidence bound (UCB) to add a bonus on sensitivity; last, using Thompson sampling (TS) as well as information-directed sampling (IDS) to direct the search, where the posterior function of reward is fitted by Beta distribution or Gaussian distribution. Those combined methods are evaluated on some structure compliance minimization tasks from 2D to 3D, including the variable thickness design problem of an atmospheric diving suit (ADS). We show that all methods can generate various acceptable design options by varying one or two parameters simply, except that IDS fails to reach the convergence for complex structures due to the limitation of computation ability. We also show that both Beta distribution and Gaussian distribution work well to describe the posterior probability.