Accurate localization of gangue plays an essential role in the vision-based gangue sorting system. However, the gangue commonly has indistinguishable appearances and is stacked with coals in a cluttered mining environment, posing challenges for efficient gangue identification and volume measurement from images. In this article, we propose a novel inspection framework for fully automated gangue localization and volume measurement based on deep learning and point cloud processing techniques. To efficiently perform gangue inspection, first, a 3-D scanning system is assembled to automatically capture the clear images and 3-D point clouds of gangue. Second, instead of relying on handcrafted features, a data-driven approach streamlined gangue detection network (SGDNet) is proposed to detect gangue from images. In particular, we design an adaptive fusion unit (AFU) to fuse multiple hierarchical features and assign adaptive weights to different scales of feature maps via the attention mechanism in convolutional neural networks, which can include more location details of gangue. Furthermore, we project the gangue detection results into the 3-D scanning space and introduce a simplified 3-D object segmentation algorithm based on a surface curvature filter to extract the points of gangue. The extracted points can then be used to perform volume calculations. Several experiments are performed to demonstrate the priority of the proposed SGDNet and the effectiveness of the presented gangue localization and volume measurement framework.