This paper proposes a reinforcement learning-based topology optimization (TO)/generative design approach using the Upper Confidence Bound (UCB) method to achieve a diverse set of optimal part designs for additive manufacturing. The UCB method has been integrated into two density-based TO problems – a compliance minimization problem and a thermal conduction problem along with Design for Additive Manufacturing (DfAM) filters to ensure improved additive manufacturability of the resultant topologies using different AM processes. The DfAM constraints are enforced by applying the support minimization filter and the thin feature minimization constraint in the TO model. The Solid Isotropic Material Penalization based TO model is perturbed to various levels of exploration and exploitation using the UCB exploration parameter to present different optimal designs. Unlike other data-reliant deep learning methods used in TO, the non-data-driven learning method proposed in this research is based on historical TO iterations integrated into a DfAM-constrained TO model, and it improves the method’s scalability to real-world and often computationally expensive part design applications. This paper fulfills a gap in computationally efficient methods for exploring generative designs of structural and thermal loaded parts with improved functional performance and additive manufacturability.