The efficient separation of coal and gangue in the mining process is of great significance for improving coal mining efficiency and reducing environmental pollution. Automatic detection of coal and gangue is the key and foundation for the separation of coal and gangue. In this paper, we proposed a hierarchical framework for coal and gangue detection based on deep learning models. In this framework, the Gaussian pyramid principle is first used to construct multi-level training data, leading to the sets of coal and gangue image features with multiple scales. Then, the coal and gangue regional proposal networks (CG-RPN) are designed to determine the candidate regions of the target objects in the image. Next, convolution neural networks (CNNs) are constructed to recognize coal and gangue objects on the basis of extracted candidate regions. We performed our method on three different datasets. Experimental results showed that the proposed method improves the detection accuracy of coal and gangue objects by 0.8% compared with the previous methods, reaching up to 98.33%. In addition, our proposed method enables the detection of multiple coal and gangue objects in an individual image and solves the problem of queuing requirements in traditional methods.