Mobile communications have experienced exponential growth both in connectivity and multimedia traffic in recent years. To support this tremendous growth, device-to-device (D2D) communications play a significant role in 5G and beyond 5G networks. However, enabling D2D communications in an underlay, heterogeneous cellular network poses two major challenges. First, interference management between D2D and cellular users directly affects a system’s performance. Second, achieving an acceptable level of link quality for both D2D and cellular networks is necessary. An optimum resource allocation is required to mitigate the interference and improve a system’s performance. In this paper, we provide a solution to interference management with an acceptable quality of services (QoS). To this end, we propose a machine learning-based resource allocation method to maximize throughput and achieve minimum QoS requirements for all active D2D pairs and cellular users. We first solve a resource optimization problem by allocating spectrum resources and controlling power transmission on demand. As resource optimization is an integer nonlinear programming problem, we address this problem by proposing a deep Q-network-based reinforcement learning algorithm (DRL) to optimize the resource allocation issue. The proposed DRL algorithm is trained with a decision-making policy to obtain the best solution in terms of spectrum efficiency, computational time, and throughput. The system performance is validated by simulation. The results show that the proposed method outperforms the existing ones.