Published in last 50 years
Related Topics
Articles published on Image Information
- New
- Research Article
- 10.1038/s41598-025-22649-0
- Nov 6, 2025
- Scientific reports
- Benyamin Mirab Golkhatmi + 2 more
Medical image segmentation is crucial in accurately diagnosing diseases and assisting physicians in examining relevant areas. Therefore, there is a pressing need for an artificial intelligence-based model that can facilitate the diagnostic process and reduce errors. Existing networks often have high parameters, high gigaflops, and low accuracy. This research addresses this gap by proposing a transformer-based architecture. We developed a Swin transformer as an encoder to improve segmentation accuracy in medical images by leveraging deep feature extraction. This model can extract key image features more accurately by employing sliding windows and an attention mechanism. The overarching goal of this research is to design an optimized architecture for medical image segmentation that maintains high accuracy while reducing the number of network parameters and minimizing computational costs. In the decoder section, we designed the dynamic feature fusion block (DFFB) to enhance the extracted features, enabling the extraction of multi-scale features. This capability enables the model to analyze the structural information of medical images at various levels, resulting in improved performance in segmenting complex regions. We also employed the dynamic attention enhancement block to further enhance the features extracted from the DFFB output. This block utilizes spatial and channel attention mechanisms to emphasize key areas in the images, thereby enhancing the model's overall accuracy. The proposed model achieved segmentation performance across three medical datasets, obtaining a mean intersection over union (mIoU) of 0.9125 and a Dice score of 0.9542 on GlaS, an mIoU of 0.9174 and a Dice score of 0.9569 on PH2, and an mIoU of 0.9085 and a Dice score of 0.9521 on Kvasir-SEG. The experiments illustrate that the proposed model outperforms previous methods, demonstrating its potential as an effective tool in medical image segmentation.
- New
- Research Article
- 10.1038/s41598-025-22867-6
- Nov 6, 2025
- Scientific reports
- Xiaoyan Mu + 1 more
Due to differences in physiological characteristics and drug metabolism between children and adults, drug efficacy evaluation and safety monitoring in pediatric drug development present significant challenges. This paper proposes a data-driven incentive mechanism for pediatric drug development based on medical imaging data. This approach optimizes drug market pricing through precise imaging data, promoting accessibility and R&D efficiency for pediatric drugs. This study first collects multi-source computed tomography (CT), magnetic resonance imaging (MRI), and X-ray data, focusing on images of common pediatric diseases. After data preprocessing, a convolutional neural network (CNN) is used for feature extraction to extract key image information. Image difference methods and a U-Net image segmentation network are then used to evaluate drug efficacy and safety, quantify efficacy changes, and analyze side effects. Next, a drug efficacy-safety evaluation model is developed, and game theory is employed to design a R&D incentive mechanism. Monte Carlo simulation is combined with risk assessment to comprehensively consider factors such as cost, R&D investment, and market demand during the pricing optimization phase. A dynamic pricing strategy is implemented to ensure both economic benefits and social accessibility of the drug. Experiments have shown that the drug has a good development effect, with an average tumor volume reduction of 32.7% (95% CI: 28.4%-36.9%). The drug's impact on organ volume is within ± 2cm³, and the market pricing strategy selects a relatively optimal price point.
- New
- Research Article
- 10.1364/oe.573079
- Nov 5, 2025
- Optics Express
- Jiaxin Wu + 7 more
Single-pixel imaging technology has demonstrated unique advantages in weak light detection and special spectral sensing fields. However, its reconstruction performance is limited at low measurement rates, which remains a key challenge. Current mainstream reconstruction processes face issues such as the decoupling and underutilization of measurement information at low rates, as well as the loss of sample diversity caused by deterministic mappings, thereby restricting the ability to capture fine details in complex scenes. To address these challenges, this study proposes a dual-stage reconstruction framework with multi-feature prior fusion, WFD-SPI, which achieves deep analysis of measurement information and feature reconstruction optimization through innovative architectural design. Specifically, in the conditional generative pre-training phase, the Moore-Penrose generalized inverse operator is introduced to construct the pseudoinverse space of the measurement matrix, establishing a bidirectional mapping relationship between measurement information and image information, effectively extracting latent feature representations. In the inverse diffusion reconstruction process, the Fourier frequency domain features and wavelet multi-scale space features are non-linearly fused innovatively, and a differentiable operator constraint is applied to generate the joint probability distribution of the generated image and real data, significantly enhancing the representation of high-frequency textures and edge information. Experimental results show that our method outperforms existing SPI techniques in terms of recovery quality at low measurement rates. Subsequent ablation further validates the effectiveness of our approach.
- New
- Research Article
- 10.1002/advs.202516379
- Nov 5, 2025
- Advanced science (Weinheim, Baden-Wurttemberg, Germany)
- Jiaxin Liu + 4 more
Artificial neuromorphic vision systems emulate the biological visual pathway by integrating sensing, storage, and information processing within a unified architecture. Featuring high speed, low power consumption, and superior temporal resolution, they demonstrate significant potential in fields such as autonomous driving, facial recognition, and intelligent perception. As the core building block, the optoelectronic synapse plays a decisive role in determining system performance, which is closely related to its material composition, structural design, and functional characteristics. This review systematically summarizes recent progress in optoelectronic synaptic materials, device architectures, and performance evaluation methodologies. Furthermore, it explores the working mechanisms and network architectures of optoelectronic synapse-based neuromorphic vision systems, highlighting their capability in image perception, information storage, and target recognition. Current challenges, including environmental stability, large-scale array fabrication, chip-level integration, and adaptability of visual functions to real-world scenarios, are discussed in depth. Finally, the review provides an outlook on future development trends toward stable, scalable, and highly integrated optoelectronic neural vision systems, underscoring their key importance in next-generation intelligent sensing and information-processing technologies.
- New
- Research Article
- 10.1109/tmi.2025.3575801
- Nov 1, 2025
- IEEE transactions on medical imaging
- Yingli Zuo + 7 more
Multi-task learning (MTL) has become a research hotspot for the analysis of whole-slide histopathological images (WSIs) since it can capture the shared representations of different tasks for the improvement of individual tasks. However, the shared representations learned by MTL are always dominated by the tasks appearing in the training set that is difficult to directly apply it on the unseen (new) tasks, especially when the unseen tasks are significantly different from the known tasks. To address the above issues, we develop a Task-Agnostic Feature-Learner (TAFL) for efficient adaptation to unseen clinical tasks, which can leverage useful image information from the existing tasks for new clinical trials with minimal task-specific modifications. Specifically, we firstly develop a neural architecture search (NAS) module that can design the network architectures of TAFL automatically. Then, a novel task-level meta-learning algorithm is developed to extract efficient and universal information from the known tasks for improving the prediction performance on the unseen tasks. We evaluate our method on three publicly available datasets derived from The Cancer Genome Atlas (TCGA) for various clinical prediction tasks (i.e., staging, cancer subtyping and survival prediction), and the experimental results indicate that our TAFL can effectively adapt to unseen tasks with better prediction performance.
- New
- Research Article
- 10.1109/tmi.2025.3572511
- Nov 1, 2025
- IEEE transactions on medical imaging
- Lu Tang + 7 more
Medical image fusion (MIF) plays an important role in precision diagnostics and treatment planning management, and medical image fusion quality assessment (MIFQA) has an aggressive effect in improving MIF performance. However, obtaining medical reference images is difficult, and the significant demand for medical prior knowledge and reference images is an important challenge in the field of MIFQA. To address this issue, this paper proposes a two-stage model for MIFQA. In the first stage, we design a GAN-based Quality-aware Network called QANet. By fusing the radiologist's mean opinion score (MOS) with the source image, the model is guided to generate one reference images of each quality. Then, in the second stage, the reference images are fed into our proposed class attention siamese network (CASNet) based on class activation mapping (CAM) under few-shot learning to fully explore the information in limited reference images. It can enforce the model to focus on the key lesion area and effectively reduce the dependence of MIFQA on medical fused images. Finally, the quality score of the unlabeled fused image is predicted by calculating the distance with reference image. Experiments on home-made MIFQA dataset shows that our method can achieve results that are ahead of the state-of-the-art methods.
- New
- Research Article
- 10.1109/tpami.2025.3592089
- Nov 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
- Chengjie Wang + 7 more
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR framework to leveraging strong multi-modal discriminative capabilities of CLIP. M3DM-NR consists of three stages: Stage-I introduces the Suspected References Selection module to filter a few normal samples from the training dataset, using the multimodal features extracted by the Initial Feature Extraction, and a Suspected Anomaly Map Computation module to generate a suspected anomaly map to focus on abnormal regions as reference. Stage-II uses the suspected anomaly maps of the reference samples as reference, and inputs image, point cloud, and text information to achieve denoising of the training samples through intra-modal comparison and multi-scale aggregation operations. Finally, Stage-III proposes the Point Feature Alignment, Unsupervised Feature Fusion, Noise Discriminative Coreset Selection, and Decision Layer Fusion modules to learn the pattern of the training dataset, enabling anomaly detection and segmentation while filtering out noise. Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection.
- New
- Research Article
- 10.3390/s25216688
- Nov 1, 2025
- Sensors
- Xiaohui Li + 4 more
Cross-scene classification of hyperspectral images poses significant challenges due to the lack of a priori knowledge and the differences in data distribution across scenes. While traditional studies have had limited use of a priori knowledge from other modalities, recent advancements in pre-trained large-scale language-vision models have shown strong performance on various downstream tasks, highlighting the potential of cross-modal assisted learning. In this paper, we propose a Semantic-aware Collaborative Parallel Network (SCPNet) to mitigate the impact of data distribution differences by incorporating linguistic modalities to assist in learning cross-domain invariant representations of hyperspectral images. SCPNet uses a parallel architecture consisting of a spatial–spectral feature extraction module and a multiscale feature extraction module, designed to capture rich image information during the feature extraction phase. The extracted features are then mapped into an optimized semantic space, where improved supervised contrastive learning clusters image features from the same category together while separating those from different categories. Semantic space bridges the gap between visual and linguistic modalities, enabling the model to mine cross-domain invariant representations from the linguistic modality. Experimental results demonstrate that SCPNet significantly outperforms existing methods on three publicly available datasets, confirming its effectiveness for cross-scene hyperspectral image classification tasks.
- New
- Research Article
- 10.1109/tmi.2025.3564894
- Nov 1, 2025
- IEEE transactions on medical imaging
- Ling Yang + 3 more
For chest X-ray image (CXR) analysis, effective bone structure suppression is essential for uncovering lung abnormalities and facilitating accurate clinical diagnoses. While recent deep generative models, to some extent, improve the reconstruction quality of bone-suppressed CXRs, they often fall short in delivering substantial improvements in downstream diagnosis tasks. This limitation is attributed to a narrow focus on instance-specific details, neglecting broader domain-level knowledge, which hampers bone-suppression effectiveness. In response to these challenges, our proposed framework adopts a novel approach that integrates both instance-level and domain-level information. To capture instance information, our model employs a hybrid approach using both cross-covariance attention blocks (CABs) to underscore relevant image information and a followed Vision Transformers (ViTs) encoder for image feature embedding. To capture domain information, we introduce multi-head codebook attention (MCA) which leverages codebook structure with multi-head attention mechanism to capture global, domain-level information specific to the bone-suppressed CXR domain, thereby refining the synthesis process. During optimization, our two-stage training scheme involves a MCA learning stage that encapsulates the domain of bone-suppressed CXRs in MCA through a ViT-based GAN model, and a synthesis stage that employs the learned codebook to generate bone-suppressed CXRs from the original ones, enhancing instance synthesis through domain insights. Moreover, the incorporation of CABs further refines pixel-level instance information. Extensive experiments demonstrate the superior performance of our approach, improving PSNR by 8.36% and SSIM by 2.7% for bone suppression while boosting lung disease classification by 2.8% and 4.2% on two datasets and segmentation by 1.5%.
- New
- Research Article
- 10.1038/s41598-025-21550-0
- Oct 29, 2025
- Scientific reports
- Fan Yang + 5 more
The object detection network based on YOLO has been widely used in the fields of intelligent transportation and public safety. Compared with visible light target detection, infrared target detection can work normally even in low light or harsh environments. In the visible light scene, YOLOv7-tiny has the advantages of speed and accuracy. However, when YOLOv7-tiny is directly applied to the infrared scene, the model still has some shortcomings, such as weak ability to extract detailed features, serious loss of semantic information, and more computational resources. So, a lightweight infrared target detection network called LIWL-YOLO, which is suitable for detecting both water and land targets, is proposed in this paper. Firstly, the lightweight backbone called SPFNet is designed by integrating space-to-depth convolution (SPDConv) into the FasterNet, so as to improve the feature extraction ability and speed of YOLOv7-tiny for low resolution images. Secondly, the attention module called SAF-CA is designed and added to the neck layer to make the model pay more attention to the weak texture features in the image. Furthermore, in order to improve the extraction ability of the model for low contrast information in images, the exponential space pyramid pool module is designed to replace the SPPCSPC module in YOLOv7-tiny. Finally, the knowledge distillation method of MGD is used to compress the knowledge into the improved model with YOLOv7 as the teacher model, so as to further improve the accuracy of the model for infrared targets. This paper constructs a hybrid dataset named FLIR-WSL as the experimental dataset, which combines the FLIR-v2 dataset and infrared water surface target images collected by our team. The experimental results on FLIR-WSL mixed data sets show that the map value of LIWL-YOLO is 69%, which is 4.3% higher than that of YOLOv7-tiny, and the FPS value on RTX4060 graphics card is 93. LIWL-YOLO not only takes into account the detection ability of land and water targets in infrared scene, but also realizes the balance between accuracy and speed.
- New
- Research Article
- 10.1021/acsami.5c16558
- Oct 29, 2025
- ACS applied materials & interfaces
- Yixuan Shi + 7 more
DNA molecular circuits are pivotal to biological information processing, where precise temporal regulation is essential for programmable molecular computation and controllable biochemical networks. Nevertheless, incorporating time as a regulatory parameter for information access remains a major challenge. We present a programmable sliding time window method (STWM) based on exonuclease III (Exo III), enabling the construction of temporally regulated molecular circuits for information access and supporting the complete workflow of image data encoding and decoding. By tuning the position of apurinic/apyrimidinic (AP) sites, as well as the concentrations of Exo III and AP strands, we achieved multivariable regulation over the time window for molecular signal processing. Additionally, we designed a DNA molecular circuit incorporating time window functionality, which facilitated signal-guided directional shifting and continuous processing within defined temporal intervals. This approach was successfully applied to image decoding, enabling compilation and reconstruction of a 4 × 4 image matrix. The proposed strategy provides a reusable and tunable framework for molecular timing control, offering an avenue for future applications in molecular computing, bioinformation processing, and intelligent biosensing.
- New
- Research Article
- 10.1038/s41598-025-21399-3
- Oct 28, 2025
- Scientific Reports
- Wenbin Liu + 4 more
Transformer networks present excellent performance in capturing long distance dependencies between different locations in the input sequence and are highly capable on a global scale. Recently, several neural architecture search (NAS) algorithms have been proposed for hyperspectral image (HSI) classification, which further improves the accuracy of HSI classification to a new level with more attention paid to local information. However, current two-channel network methods cannot focus on both local and global information of hyperspectral images, which leads to a decrease in the classification accuracy of this type of data. In this paper, a two-channel network named CTmixer-NAS is proposed for hyperspectral image classification. Combining the advantages of NAS and Transformer networks, local and global information is captured simultaneously. In the Transformer branch, the fusion of information at different self-coding layers is proposed. In the CNN branch, a high-performance network structure is automatically designed using NAS, which improves the accuracy of hyperspectral classification. The proposed approach saves labor cost, and the most suitable network structure can be searched according to different datasets, which makes the searched network structure present better generalization performance. CTmixer-NAS achieves the best performance in five hyperspectral datasets in comparative experiments.
- New
- Research Article
- 10.1038/s41597-025-05994-7
- Oct 28, 2025
- Scientific Data
- Junlin Ouyang + 6 more
We introduce Kust4K, a UAV-based dataset for RGB-TIR multimodal semantic segmentation. Kust4K dataset is designed to overcome key limitations in existing UAV-based semantic segmentation datasets: low information density, limited data volume, and insufficient robustness discussion under non-ideal environment. Kust4K dataset featuring 4,024 of 640 × 512 pixel-aligned RGB-Thermal Infrared image pairs captured across diverse urban road scenes under variable illumination. Extensive experiments with state-of-the-art models demonstrate Kust4K’s effectiveness, with multimodal training, significantly outperforming unimodal baselines. Additionally, these results highlight that multimodal image information is critical for obtaining more reliable semantic segmentation results. In total, Kust4K dataset advance robust urban traffic scene understanding, offering a valuable resource for intelligent transportation research.
- New
- Research Article
- 10.3390/s25216598
- Oct 27, 2025
- Sensors
- Fang Wan + 5 more
Recent advances in 3D Gaussian Splatting (3DGS) have achieved remarkable performance in scene reconstruction and novel view synthesis on benchmark datasets. However, real-world images are frequently affected by degradations such as camera shake, object motion, and lens defocus, which not only compromise image quality but also severely hinder the accuracy of 3D reconstruction—particularly in fine details. While existing deblurring approaches have made progress, most are limited to addressing a single type of blur, rendering them inadequate for complex scenarios involving multiple blur sources and resolution degradation. To address these challenges, we propose Gaussian Splatting with Multi-Scale Deblurring and Resolution Enhancement (GS-MSDR), a novel framework that seamlessly integrates multi-scale deblurring and resolution enhancement. At its core, our Multi-scale Adaptive Attention Network (MAAN) fuses multi-scale features to enhance image information, while the Multi-modal Context Adapter (MCA) and adaptive spatial pooling modules further refine feature representation, facilitating the recovery of fine details in degraded regions. Additionally, our Hierarchical Progressive Kernel Optimization (HPKO) method mitigates ambiguity and ensures precise detail reconstruction through layer-wise optimization. Extensive experiments demonstrate that GS-MSDR consistently outperforms state-of-the-art methods under diverse degraded scenarios, achieving superior deblurring, accurate 3D reconstruction, and efficient rendering within the 3DGS framework.
- New
- Research Article
- 10.13345/j.cjb.250355
- Oct 25, 2025
- Sheng wu gong cheng xue bao = Chinese journal of biotechnology
- Xiuhua Li + 4 more
An intelligent recognition method for crop density based on Faster R-CNN
- New
- Research Article
- 10.1371/journal.pone.0333640
- Oct 22, 2025
- PLOS One
- Feixian Liu + 1 more
This paper proposes a structurally simplified 2D quadratic sine map (2D-SQSM). This map effectively addresses the insufficient chaos performance of traditional chaotic maps while avoiding the overly complex structures of emerging chaotic maps. Evaluated using multiple chaos performance metrics, the 2D-SQSM demonstrates high Lyapunov exponents, and sample entropy, with chaotic characteristics superior to some advanced chaotic maps proposed in recent years. Based on the 2D-SQSM, this paper further designs a highly robust color image encryption algorithm. First, by introducing different hash functions multiple times, the correlation between the key and plaintext is enhanced, significantly improving resistance against brute-force attacks; second, cyclic shifting and segmentation-recombination operations are applied separately to the three RGB channels to effectively disrupt pixel distribution and significantly reduce spatial correlation between pixels; finally, the chaotic sequence generated by the 2D-SQSM is utilized for XOR diffusion, further enhancing the randomness and diffusion capability of the ciphertext. A large number of simulation results demonstrate that this algorithm can significantly enhance the image information entropy, and can effectively reduce pixel correlation, possessing good statistical properties. Furthermore, it is robust against differential attacks, noise attacks, cropping attacks, chosen plaintext attacks, etc., and is suitable for secure image transmission.
- New
- Research Article
- 10.1093/ndt/gfaf116.0816
- Oct 21, 2025
- Nephrology Dialysis Transplantation
- Israel Mateos-Aparicio-Ruiz + 11 more
Abstract Background and Aims Antibody-mediated rejection (AMR) remains a significant cause of late allograft failure. However, considerable variability exists in its diagnosis, even among experts. Weakly supervised machine learning applied to renal biopsy whole slide images (WSIs) could offer cost-efficient and accurate diagnostics with perfect reproducibility. In this study, we build on our previous work developing diagnostic models for AMR using a multi-institutional dataset including adversarial samples such as accommodation, transmitted, recurrent, and de novo diseases. Method A dataset of 1,183 periodic acid-Schiff WSIs from 348 patients from four different institutions was automatically segmented into tissue compartment crops. Graph neural networks (GNNs) were employed to classify AMR and non-AMR (including adversarial samples like accommodation, transmitted and de novo glomerulopathy). The WSIs were represented as fully connected graphs, with glomerular crops as nodes, capturing global spatial relationships. Feature vectors for individual glomerular crops were computed using both supervised (Swin Transformer) and self-supervised (MAE and SimCLR) architectures. Classification was performed using Graph-Transformer and three novel models: SimpleGCN, DenseGCN, and SimpleGAT. These WSI-level classifiers were compared to state-of-the-art patch-level classification methods (Swin and ConvNeXt). Performance was determined in 5-fold internal cross-validation experiments. Results The GNN-based methods outperformed baseline patch-level classification models. The best- performing model, SimpleGCN with Swin-extracted features, achieved an accuracy of 71.00% and an AUC of 0.7858, significantly better than the Swin model (accuracy of 65.66% and an AUC of 0.7265). Conclusion This study shows the potential of graph-based representations to model contextual information in nephropathology images. Our approach permits easy upscaling of training cohorts for cost-efficient and even more accurate diagnostic support systems. To this end and for further validation we are actively seeking collaborators. GB and JUB contributed equally.
- New
- Research Article
- 10.1002/lpor.202501944
- Oct 21, 2025
- Laser & Photonics Reviews
- Rui Liu + 3 more
Abstract To expand the vortex beam manipulation dimensions and improve orbital angular momentum (OAM) holographic encryption security, a novel vortex structured light field featuring space‐variant polarization is presented. The space‐variant polarization beam unite the modulation of the phase, amplitude, and polarization of vortex beams. While enabling arbitrary amplitude modulation and beam shaping for vortex light fields, it also achieves arbitrary customization of the local polarization states within the vortex beam. More importantly, leveraging the multi‐dimensional cooperative control characteristics of structured light fields, an OAM holographic encryption system with polarization‐switching capabilities is developed. By dynamically switching the polarization states of space‐variant polarized structured light, the polarization channels of the holographic encryption system can be altered, ultimately enabling controllable reconstruction of image information. Experimental results demonstrate that this work extends the traditional vortex light field control dimensions to triple physical dimension (amplitude, polarization, and topological phase) and also establishes a polarized OAM multiplexing encryption channel. The system supports dynamic polarization reconstruction of images, significantly enhancing security and channel capacity.
- Research Article
- 10.1080/10739149.2025.2573229
- Oct 18, 2025
- Instrumentation Science & Technology
- Yuqian Dong + 8 more
Infrared and visible image fusion represents a crucial image enhancement technique aimed at generating high-quality fused images with salient targets and rich textures under extreme environmental conditions. Current image fusion methods exhibit significant performance degradation under challenging illumination conditions, failing to simultaneously preserve the textural details of visible images and the thermal target saliency of infrared images. To address these limitations, an Adaptive Illumination Fusion Network (AIFNet) is proposed, which incorporates three key components: IDBlock (the Illumination Discrimination Block) module for illumination type discrimination and parameter adjustment, MSIDNet (Multi-Scale Illumination Disentanglement Net) for illumination-adaptive feature correction, and TCEFNet (Texture Contrast Enhancement Fusion Net) for deep multimodal feature integration. To further enhance fusion quality, a hierarchical loss function is designed comprising feature fidelity loss (preserving source image information), multi-scale gradient alignment loss (enhancing structural consistency), and illumination-adaptive constraints (optimizing performance under extreme lighting). Experimental results demonstrate that AIFNet consistently outperforms existing methods across low light, overexposed, and normal illumination conditions. The high-quality fused images generated by the proposed method exhibit enhanced detail clarity and contrast under diverse illumination conditions, which are expected to benefit subsequent high-level vision tasks.
- Research Article
- 10.1080/19392699.2025.2575188
- Oct 18, 2025
- International Journal of Coal Preparation and Utilization
- Pengcheng Yan + 5 more
ABSTRACT With the aim of more effectively deploying the coal gangue detection model on embedded devices with limited resources, it is necessary to design smaller networks to reduce the number of parameters. By leveraging the complementarity of spectra across different bands, multispectral technology can fuse image information from various bands, effectively overcoming the environmental limitations of a single spectrum. This paper combines multispectral imaging technology and knowledge distillation to propose a lightweight Faster-YOLO coal gangue detection model. To begin with, multispectral imaging technology is applied to gather the spectral data of coal and gangue within diverse wavebands. Based on the spectral data of coal gangue in different bands, principal component analysis is conducted to reduce the dimensionality from 25 bands to 3 bands, and image fusion is implemented with these 3 principal component bands. Secondly, the YOLOv10n (You Only Look Once) target detection algorithm was improved. The improved SCG-YOLO was selected as the teacher model. By introducing SPD-Conv, the feature extraction ability of the network for low-resolution images was increased. A Neck structure based on CGRFPN was proposed. The enhanced SPD-Conv and CGRFPN are taken as the teacher model for knowledge distillation to diminish the model size and enhance the inference speed. Then, Faster-YOLO was selected as the student model. The network structure was made lightweight through the introduction of the FasterNet block. Through knowledge distillation, the network maintains good feature expression ability while reducing the amount of computation. The experimental results show that the Precision (P), Recall (R), and mean Average Precision (mAP) of the improved model after distillation for the recognition of coal and gangue are 97.4%, 96.1%, and 99.1%, respectively, which are 6%, 6.9%, and 2.3% higher than those of the original model, respectively. The number of parameters and the model size are 92.8% and 78.2% of those of the original model, respectively. The distilled Faster-YOLO algorithm has good detection accuracy, is lighter in weight, and is easily deployable on mobile terminals, which has a positive effect on the intelligent mining of underground coal mines.