Device-to-device (D2D) communications allow short-range communication devices to multiplex cellular-licensed spectrum to directly establish local connections for ultra-high number of terminal connections and greater system throughput. However, spectrum sharing also brings serious interference to the network. Therefore, a reliable and efficient resource allocation strategy is important to mitigate the interference and improve the system spectral efficiency. In this paper, we investigated spectrum access and power allocation in D2D communications underlay cellular networks based on deep reinforcement learning with the aim of finding a feasible resource allocation strategy to maximize data rate and system fairness. We proposed a value decomposition network-based resource allocation scheme for D2D communication networks. Our proposed scheme avoids frequent information exchanges among D2D users by centralized training, while allowing D2D users to make distributed joint resource allocation decisions. Simulation results show that the proposed scheme has stable convergence and good scalability, and can effectively improve the system capacity.