The recognition of cutting state of coal-rock is the key technology to realize “unmanned” mining in coal face. In order to realized real-time perception and accurate judgment of coal-rock cutting state information, this paper combined the field test sampling, construction technology of complex coal seam, virtual prototype technology, bidirectional coupling technology, data processing theory, image fusion method, and deep learning theory to carry out multi domain deep fusion experimental research on multi-source heterogeneous data of coal and rock cutting state. The typical complex coal seam containing gangue, inclusion, and minor fault in Yangcun mine of Yanzhou mining area was taken as the engineering object. The high-precision three-dimensional simulation model of the complex coal seam that can update and replace particles was constructed. Based on the simulation results of Discrete Element Method-Multi Flexible Body Dynamics (DEM-MFBD), the one-dimensional original vibration acceleration signals of the key components of the shearer cutting part were determined, including spiral drum, rocker arm shell, and square head. After transforming one-dimensional original signal data into two-dimensional time–frequency images by Short-time Fourier Transform, morphological wavelet image fusion technology was used to realize the effective fusion of characteristic information of spiral drum, rocker arm shell, and square head under different working conditions. Based on the deep learning theory, the DCGAN-RFCNN (Deep Convolutional Generative Adversarial Networks-Random Forest Convolutional Neural Networks) coal and rock cutting state recognition network model was constructed. Combining convolution neural network with random forest recognition classifier, RFCNN coal and rock cutting state recognition classification model was constructed, and the recognition network model was trained to obtain the model recognition results. Through the comparative experimental analysis of the RFCNN network model with different recognition network models and different synthetic sample numbers in the recognition network, the effectiveness of the recognition network model was verified. The results show that: When synthetic samples are not included in each working condition in the RFCNN model, the average recognition rate is 90.641%. With the increase of the number of synthetic samples, the recognition rate of coal and rock cutting state increases. When the number of synthetic samples added to each working condition reaches 5000, the recognition effect is the best, and the average recognition rate reaches 98.344%, which verifies the superiority of enriching the data set by using the improved DCGAN network. Also, the RFCNN outperformed the other variants: it obtained higher recognition accuracy by 25.085, 21.925 and 19.337%, respectively, over SVW, CNN, and AlexNet. Also, the experimental platform of shearer cutting coal and rock was built, where the coal and rock cutting state recognition network was trained and tested based on the migration learning theory. Through the statistical test results, the accuracy of coal and rock cutting state recognition is 98.64%, which realizes the accurate recognition of coal and rock cutting state.