Multi-hop D2D communication has been proposed with the purpose of improving the coverage, quality of service (QoS), and flexibility and adaptability of single-hop D2D communication. However, multi-hop D2D communication often experiences obstacles caused by interference from the shared channel, which makes spectrum efficiency in multi-hop D2D networks an important issue to tackle. In this paper, we study the optimization of spectrum efficiency in multi-hop D2D communication underlaying cellular networks. First, we use iteration-based optimization techniques such as exhaustive search (ES) and gradient search (GS) with barrier function to find the global and local optimal solutions, respectively. More importantly, we propose two machine learning (ML) techniques, the unsupervised deep neural network (DNN) and deep Q-learning (DQL) algorithms and evaluate the performances of both algorithms compared to iteration-based optimization methods. The simulation results verify that both algorithms achieve near-global optimums compared to GS. Moreover, it is verified that the DQL outperforms the unsupervised DNN in terms of optimal spectrum efficiency, while the DQL algorithm has higher time complexity than the unsupervised DNN.