Articles published on Hierarchical Features
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
3361 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.neunet.2026.108546
- Jun 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Deqian Mao + 4 more
HDFLStyler: Hierarchical domain-invariant feature learning for source-free domain generalization.
- New
- Research Article
- 10.1016/j.mex.2025.103774
- Jun 1, 2026
- MethodsX
- Akshay S + 4 more
Actor-critic guided CDBN with GAN augmentation for robust facial emotion recognition.
- New
- Research Article
- 10.1016/j.neunet.2026.108544
- Jun 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Le Liang + 2 more
MambaFPN: A SSM-based feature pyramid network for object detection.
- New
- Research Article
- 10.1016/j.compbiolchem.2026.108918
- Jun 1, 2026
- Computational biology and chemistry
- Assad Rasheed + 4 more
cervical nuclei segmentation through synergic conditional generative adversarial network in cervical smear images.
- New
- Research Article
- 10.26599/tst.2025.9010119
- Jun 1, 2026
- Tsinghua Science and Technology
- Ding Zhou + 5 more
Current intrusion detection in industrial control system (ICS) typically relies on network flows or traffic packets, often neglecting differences in payloads of functional fields and their heterogeneous responses under attacks. Moreover, most methods depend on manually crafted features, limiting the utilization of raw traffic byte streams and constraining detection performance. This paper proposes a multi-view correlation intrusion detection model that incorporates spatiotemporal features to enhance detection in ICS. By integrating byte streams with parsed field data, the model leverages traffic information through multi-view collaborative modeling. A fine-grained hierarchical feature framework is developed to extract behavior patterns from each field attribute, and cross-attention mechanisms capture inter-view relationships to construct a comprehensive representation of traffic content. A spatial feature extractor based on convolutional neural network (CNN) and a temporal extractor using Transformer architecture are employed to learn deep spatiotemporal features. A focal loss function is adopted to compute anomaly scores, which support the final intrusion detection decisions. Experiments on the water distribution testbed dataset show that the proposed model achieves superior performance compared to state-of-the-art methods, enabling accurate and efficient intrusion detection in ICS environments.
- New
- Research Article
- 10.1016/j.neunet.2026.108692
- Jun 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Xingru Huang + 14 more
Ophthalmic diseases such significantly impair the vision of numerous individuals globally. Accurate and real-time 3D reconstruction of macular edema and retinal tears is crucial for improving surgical efficiency and success rates. However, lesion areas often exhibit considerable noise and high heterogeneity, and the imaging devices employed may introduce electronic noise and artifacts. Current 2D medical image segmentation techniques fail to achieve optimal outcomes. To overcome these challenges, we propose the Tri-Path Fourier-Temporal Modulation Network (TriFTM-Net). TriFTM-Net synergistically integrates spatial, frequency, and spatiotemporal features. This design effectively augments both feature representation and extraction. TriFTM-Net comprises three critical modules: the Tri-Path Spectral Hierarchical Encoder (TPSHE), which amplifies feature representation by integrating tri-path features; the Feature Re-Modulation (FRM), which reduces noise interference and enhances feature extraction; and the Hierarchical Feature Reconstruction Module (HFRM), which improves detail preservation in upsampled images. Comparative analysis with thirteen baseline methods demonstrates that our approach achieves the highest Dice scores, IoU, and Kappa coefficient on the OIMHS dataset.Our code is publicly available at https://github.com/IMOP-lab/TriFTM-Net.
- New
- Research Article
- 10.1016/j.forsciint.2026.112897
- Jun 1, 2026
- Forensic science international
- Yong-Zhi Quan + 3 more
A computationally efficient hybrid Kolmogorov-Arnold network for hyperspectral classification of signatory pen inks.
- New
- Research Article
- 10.1016/j.neucom.2026.133233
- Jun 1, 2026
- Neurocomputing
- Xiaoyu Wei + 1 more
HFF-BiG: Hierarchical feature fusion with bi-directional gating for 3D indoor instance segmentation
- New
- Research Article
- 10.1016/j.foodchem.2026.148862
- May 30, 2026
- Food chemistry
- Ge Zhang + 8 more
Carbohydrates: How structural features influence digestion-fermentation kinetic parameters.
- New
- Research Article
- 10.1038/s41598-026-53115-0
- May 19, 2026
- Scientific reports
- Guokun Shi + 6 more
Multi-label chest X-ray (CXR) classification is challenging because thoracic abnormalities vary substantially in scale, visual saliency, and anatomical distribution, while disease labels often exhibit clinically meaningful dependencies. We propose a visual-semantic framework that integrates heterogeneous visual representations with graph-guided label reasoning for image-level multi-label CXR classification. The visual encoder consists of a Vision Transformer (ViT) branch and a DenseNet-121 branch with complementary inductive biases: the ViT branch provides self-attention-based content-adaptive token representations, whereas the DenseNet branch provides hierarchical convolutional feature maps with explicit spatial layouts. A multi-scale bidirectional dual cross-attention fusion (DCAF) module aligns these two representations and enables bidirectional cross-representation interaction at the [Formula: see text] and [Formula: see text] stages to construct a fused visual memory. To model label dependencies, we construct an ML-GCN-style label graph whose edges are derived from training-set conditional co-occurrence statistics and whose node features are initialized using GloVe label-name embeddings. The resulting GCN-refined label embeddings initialize the label queries of a Transformer decoder, which retrieves label-specific evidence from the fused visual memory and predicts a single logits matrix for multi-label classification. The proposed method achieves a Mean AUC of 0.849 on ChestX-ray14 following its official evaluation protocol and 0.815 on CheXpert using an internal 70%/10%/20% training/validation/testing partition. Qualitative Grad-CAM visualizations on selected cases further suggest that the proposed framework tends to produce activation patterns consistent with manually indicated visually suspicious regions; these visualizations are not intended as a formal localization evaluation. Overall, the results indicate that cross-representation visual fusion and graph-guided label-query decoding provide complementary benefits for multi-label CXR classification.
- New
- Research Article
- 10.1007/s00259-026-07918-y
- May 16, 2026
- European journal of nuclear medicine and molecular imaging
- Shao-Chun Li + 6 more
To investigate the feasibility of non-invasively identifying bone marrow involvement (BMI) in follicular lymphoma (FL) using baseline 18F-FDG PET/CT combined with multidimensional feature fusion, and to compare the impact of different bone marrow volume-of-interest (VOI) frameworks on model performance. This retrospective study included 187 patients with newly diagnosed FL, 93 of whom had BMI. Based on baseline 18F-FDG PET/CT, two bone marrow VOI frameworks were constructed: a pelvic VOI framework and a spine-pelvis VOI framework. Clinical features, conventional imaging features, radiomic features, and deep learning features were extracted. A hierarchical feature screening strategy was employed: clinical and conventional imaging features were screened using univariate logistic regression, Spearman's correlation analysis, and multivariate logistic regression, whereas high-dimensional radiomic and deep learning features were screened using LASSO regression combined with the Boruta algorithm. Based on the selected features, six different modelling schemes were developed. The optimal scheme was selected using the area under the receiver operating characteristic curve (AUC) in the independent validation set as the primary metric. Under the optimal scheme, the performance of seven machine learning models-logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), neural network (NN), random forest (RF), k-nearest neighbours (KNN), and adaptive boosting (AdaBoost)-was further compared. SHAP analysis was used to interpret the key features of the final model and the direction of their contributions. Compared with the non-BMI group, the BMI group was more likely to present with widespread regional lymph node involvement, B symptoms, larger lymph node lesions, as well as lower Hb, higher LDH, lower Apo A, lower eGFR, and higher β2-MG levels (all P < 0.05). Under both VOI frameworks, the BMI group exhibited higher bone marrow FDG uptake intensity and metabolic burden, as reflected by higher values of conventional PET/CT features, including SUVmean, Standard Deviation (PET), RMS, 25th Percentile Value, Median, 75th Percentile Value, TLG, Glycolysis Q2-Q4, SAM, and SUVpeak (all P < 0.05). Multivariate logistic regression analysis indicated that regional lymph node involvement and β2-MG consistently remained independent predictors across both VOI frameworks, whereas SUVmean retained statistical significance only within the pelvic VOI framework. A comparison of six modelling schemes revealed that the scheme integrating the spine-pelvis VOI framework with clinical features, conventional imaging features, and radiomic features performed best. Under this scheme, the GBM model achieved the best overall performance on the independent validation set (AUC = 0.906, Accuracy = 0.877, Precision = 0.926, Sensitivity = 0.833, Specificity = 0.926, F1 score = 0.877). SHAP analysis revealed that, in addition to LNr (≥ 5) and β2-MG, first-order statistical features such as PET-Orig-FO-IQR, as well as texture features derived from wavelet/LBP transformations-including PET-Wav-HLL-NGTDM-Strength, PET-Wav-HLL-GLRLM-SRHGLE, CT-LBP3D-m1-GLCM-MCC, and PET-LBP3D-m2-GLSZM-SAHGLE-also made significant contributions. These findings suggest that BMI-associated imaging phenotypes are characterised not only by increased bone marrow metabolism but also by remodelling of the grey-level distribution and spatial heterogeneity within the bone marrow. Bone marrow involvement in follicular lymphoma is associated with higher tumour burden and altered metabolic heterogeneity within the bone marrow. A PET/CT-based radiomic-clinical model showed good performance for non-invasive BMI prediction, and the spine-pelvis VOI framework outperformed the pelvic VOI framework alone. The final GBM model may provide a feasible imaging biomarker for complementary baseline assessment of BMI in FL.
- New
- Research Article
- 10.1631/jzus.b2500451
- May 15, 2026
- Journal of Zhejiang University. Science. B
- Ting Li + 12 more
Accurate quantification of crop residue cover (CRC) is crucial for monitoring and evaluating conservation tillage practices, yet it poses a significant image segmentation challenge. The subtle visual distinctions between fragmented residue and soil, compounded by variable illumination and shadows in field imagery, often lead to poor segmentation performance. To overcome these limitations, we introduce RCTUnet, a novel deep learning architecture designed for robust crop-residue-soil segmentation and precise CRC estimation. RCTUnet's architecture synergistically integrates three key components: (1) a ResNet50 backbone for deep, multi-scale feature extraction; (2) a convolutional block attention module (CBAM) to adaptively focus on salient residue features across both channel and spatial dimensions; and (3) a transformer-based global context fusion module (GCFM) to model long-range spatial dependencies, which is critical for interpreting heterogeneous residue patterns. We evaluated RCTUnet on a dataset of 1220 field-acquired images spanning four typical crop rotations. Experimental results show that, compared to traditional models: (1) RCTUnet achieves significantly higher crop-residue-soil segmentation accuracy than classic models including Unet, Unet++, DeepLabV3, segmentation network (SegNet), and fully convolutional network (FCN), with improvements of 3.24%, 3.42%, 4.88%, 8.28%, and 6.05% in overall accuracy, respectively; (2) RCTUnet yields superior residue-soil segmentation performance, with increases in residue recall of 7.67%, 7.37%, 14.09%, 27.05%, and 16.91%, respectively; (3) RCTUnet shows enhanced CRC estimation accuracy, achieving a root mean square error (RMSE) of 4.875, representing a 45.5% improvement over Unet (RMSE=8.941). These results demonstrate the efficacy of our hybrid approach, which combines deep hierarchical features, dual-domain attention, and global context modeling. RCTUnet provides a robust and reliable tool for automated CRC assessment, advancing the capabilities of in-field agricultural monitoring.
- Research Article
- 10.1109/tip.2026.3689419
- May 12, 2026
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Xiang Xiang + 3 more
Deep Neural Networks (DNNs) exhibit surprising zero-shot generalization and emergent phenomena across various tasks. However, the underlying mechanisms behind these behaviors remain unclear. By analyzing the perception of image frequencies by DNNs, we establish the association between the generalization behavior and frequency-aware regions. DNNs with stronger generalization exhibit wider frequency-aware regions. Therefore, we improve the generalization performance by broadening the frequency awareness. Specifically, we enable DNNs to learn the relations between high-frequency components and semantic labels through frequency decomposition and mixup. Based on hierarchical feature alignment, we allow larger submodels to guide the frequency awareness of smaller submodels. Beyond training, we ensemble submodels to extract features from different frequency bands to enrich DNNs' frequency awareness during inference. We validate the effectiveness of our proposed method in image classification and object detection tasks in single and multi-source domain generalization scenarios. We also demonstrate the plug-and-play scalability of our method across existing approaches and different DNNs.
- Research Article
- 10.1038/s41598-026-49070-5
- May 11, 2026
- Scientific reports
- Weidong Wang + 4 more
Accurate segmentation of polyps in colonoscopy images plays a pivotal role in the early detection and subsequent treatment of colorectal cancer. Nevertheless, existing segmentation techniques face challenges due to the significant variability in polyps' shapes, sizes, and the often indistinct contrast between polyp boundaries and the surrounding mucosal tissue. To address these limitations,we introduce a novel network architecture termed FDSS-Net, aimed at enhancing segmentation accuracy. Our key innovations include, Firstly, the Feature Enhancement and Propagation Module (FEPM) is designed to capture intricate context across multiple scales. It achieves this by integrating depthwise separable convolutional layers with varying kernel sizes, enabling the model to discern fine details and broader patterns simultaneously. Secondly, the Dual-Stream Semantic Mixture (DSSM) Module facilitates hierarchical feature alignment and deep semantic blending across adjacent levels. By incorporating a cross-attention mechanism and a global context modeling block, DSSM ensures that relevant features are effectively combined and utilized. Lastly, the Hierarchical Multi-scale Aggregation and Prediction Module (HMAP) aggregates features progressively from coarse to fine scales, guided by a learnable gate. This method outperformed 12 state-of-the-art methods on five datasets, especially achieving a Dice coefficient of 0.8302 and mIoU of 0.7587 on the ETIS-LaribPolypDB dataset. It demonstrates the potential to enhance clinical computer-aided diagnosis and provides inspiration for further research in this field.
- Research Article
- 10.1007/s10792-026-04090-y
- May 8, 2026
- International ophthalmology
- P Gopi Kannan + 2 more
Accurate segmentation of glaucoma-related anatomical structures from retinal fundus images is crucial for reliable clinical assessment and early disease diagnosis. However, variations in illumination, low contrast, and complex structural patterns make precise boundary delineation of the optic disc (OD) and optic cup (OC) challenging. This study aims to improve the accuracy of OD and OC segmentation for glaucoma assessment. An Enhanced SwinUNet model is proposed, integrating hierarchical transformer-based feature extraction with a Dual-Stage Context-Aware Feature Refinement (DCF-Refine) module embedded in skip connections. A preprocessing stage is applied using CLAHE-based contrast enhancement in LAB color space along with min-max normalization to improve image quality and stabilize training. The model employs Swin Transformer (ST) blocks to capture both local structural details and long-range dependencies. The DCF-Refine module enhances feature fusion through sequential Spatial Context Refinement (SCR) and Channel Context Refinement (CCR). Experimental evaluation on the Drishti-GS and REFUGE datasets demonstrates that the proposed Enhanced SwinUNet achieves superior performance compared to existing segmentation methods, attaining accuracies of 99.3% and 99.1%, respectively. The proposed model provides highly accurate and reliable segmentation of OD and OC structures, effectively addressing challenges in retinal image analysis. Its strong performance supports improved glaucoma-related structural assessment and has potential for clinical application.
- Research Article
- 10.1038/s42003-026-10169-0
- May 7, 2026
- Communications biology
- Pablo Marcos-Manchón + 1 more
The brain transforms visual inputs into cortical representations that support diverse cognitive and behavioral goals. Characterizing how this information is organized and routed across the human brain is essential for understanding how we process complex visual scenes. Here, we applied representational similarity analysis to 7T fMRI data collected during natural scene viewing. We quantified representational geometry shared across individuals and compared it to hierarchical features from vision and language neural networks across model layers. By integrating these comparisons with representational connectivity between cortical regions, we identified two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models aligned with shared structure in both routes, whereas language models corresponded primarily with the lateral pathway and showed negative alignment in early visual and ventral cortex. These findings refine classical visual-stream models by revealing a distributed cortical network with separable representational routes for context and animate content during scene perception.
- Research Article
- 10.1038/s41598-026-51495-x
- May 7, 2026
- Scientific reports
- Chao Li + 5 more
Medical image interpretation plays a critical role in lumbar fusion surgery, where accurate analysis of anatomical structures is essential for clinical assessment. However, most existing deep learning approaches rely primarily on visual features and fail to effectively integrate heterogeneous clinical information. This study proposes a multimodal deep learning framework for lumbar spine image interpretation by jointly modeling medical images and associated clinical text. The framework adopts a global-local representation learning strategy to capture both overall anatomical context and fine-grained structural information. A visual encoder extracts hierarchical features from lumbar radiographs and CT scans, while a transformer-based text encoder captures semantic information from clinical reports. These representations are projected into a shared embedding space to enable cross-modal alignment. To enhance feature interaction, a text-guided attention mechanism is introduced to model correspondence between image regions and textual descriptions. The learned multimodal representations are applied to multiple downstream tasks, including cross-modal retrieval, classification, and lumbar structure segmentation. Experimental results show that the proposed framework outperforms image-only baselines and achieves competitive performance compared with existing multimodal approaches. The integration of global and local representations improves feature discrimination and structural modeling. Visualization results provide qualitative evidence that the model focuses on anatomically relevant regions, although such observations should be interpreted with caution. Overall, the proposed framework demonstrates the potential of multimodal representation learning for lumbar spine image analysis and provides a structured approach for integrating heterogeneous clinical data.
- Research Article
- 10.1007/s00530-026-02351-5
- May 6, 2026
- Multimedia Systems
- Jiefu Mei
Hierarchical feature multi-contrastive learning for skin cancer classification
- Research Article
- 10.1080/00405000.2026.2668798
- May 4, 2026
- The Journal of The Textile Institute
- Sixiang Wang + 4 more
To develop high-performance keratin-based functional materials, this study investigates the efficient isolation and structural preservation of wool cortical cells as intrinsic reinforcement components for composite systems. Wool fibers are predominantly composed of highly ordered cortical cells, accounting for approximately 90% of the fiber mass, and their hierarchical structural features make them particularly attractive as natural reinforcement elements in keratin-based materials. Herein, an optimized isolation strategy for wool cortical cells was established and systematically evaluated. The results show that effective isolation can be achieved by treating descaled wool with 50% citric acid for 20 min, followed by ultrasonic disruption in formic acid at 550 W for 1 h. Under these conditions, an extraction yield of up to 45.04% was obtained, while the isolated cortical cells exhibited good morphological integrity, high molecular weight retention, and favorable thermal stability. Furthermore, the extracted cortical cells were incorporated into regenerated keratin to fabricate a self-reinforced composite film. The resulting composite exhibited a pronounced enhancement in mechanical performance compared with the pure keratin film, demonstrating the feasibility of utilizing wool-derived cortical cells as effective natural reinforcement agents. This work provides a comparatively mild and efficient route for cortical cell isolation and highlights their potential application in sustainable keratin-based composite materials.
- Research Article
- 10.1186/s12938-026-01575-w
- May 4, 2026
- Biomedical engineering online
- Xiaotong Wang + 3 more
Uterine fibroids represent one of the most prevalent gynecological tumors; however, their ultrasound images frequently exhibit indistinct boundaries and complex morphologies, thereby complicating accurate segmentation. An enhanced Swin-Unet-based framework, designated the Swin-Unet Edge-Sensitive Segmentation (SES) network, is proposed herein to advance boundary delineation and segmentation accuracy. The SES network incorporates the Residual Channel Attention Network (RCAN) to recalibrate feature responses via channel attention weighting, thereby reinforcing the representation of lesion regions, and the Richer Convolutional Features (RCF) module to preserve multi-scale spatial information through hierarchical feature integration, effectively addressing pixel-level classification in regions with blurred boundaries. The model was evaluated on annotated ultrasound images provided by Shanxi Provincial Children's Hospital. Experimental findings demonstrate that SES consistently outperforms established architectures, including U-Net, U-Net++, Attention U-Net, and TransUNet, achieving superior performance across multiple indices (Dice coefficient: 0.9452; IoU: 0.8721; accuracy: 0.9358). Ablation analyses further substantiate the pivotal contributions of the RCAN and RCF modules to the overall segmentation performance. The proposed SES framework integrates global modeling capacity, multi-scale attention mechanisms, and edge-sensitive feature extraction to deliver a more accurate and robust solution for the ultrasound image segmentation of uterine fibroids, highlighting its substantial potential for clinical application.