The coal-gangue recognition technology plays an important role in the intelligent realization of fully mechanized caving face and the improvement of coal quality. Although great progress has been made for the coal-gangue recognition in recent years, most of them have not taken into account the impact of the complex environment of top coal caving on recognition performance. Herein, a hybrid multi-branch convolutional neural network (HMBCNN) is proposed for coal-gangue recognition, which based on improved Mel Frequency Cepstral Coefficient (MFCC) as well as Mel spectrogram, and attention mechanism. Firstly, the MFCC and its smooth feature matrix are input into each branch of one-dimensional multi-branch convolutional neural network, and the spliced features are extracted adaptively through multi-head attention mechanism. Secondly, the Mel spectrogram and its first-order derivative are input into each branch of the two-dimensional multi-branch convolutional neural network respectively, and the effective time-frequency information is paid attention to through the soft attention mechanism. Finally, at the decision-making level, the two networks are fused to establish a model for feature fusion and classification, obtaining optimal fusion strategies for different features and networks. A database of sound pressure signals under different signal-to-noise ratios and equipment operations is constructed based on a large amount of data collected in the laboratory and on-site. Comparative experiments and discussions are conducted on this database with advanced algorithms and different neural network structures. The results show that the proposed method achieves higher recognition accuracy and better robustness in noisy environments.