Articles published on Semantic Alignment
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
765 Search results
Sort by Recency
- New
- Research Article
- 10.3390/f17030355
- Mar 11, 2026
- Forests
- Cheng Li + 6 more
Small pest detection in ultra-high-resolution forestry images is challenging due to extreme scale variation, complex backgrounds, and limited annotated data. To address these issues, we propose SSFPDet (Semi-Supervised Forest Pest Detector), a semi-supervised object detection framework designed for low-annotation settings. Built upon the Soft Teacher paradigm, SSFPDet integrates a YOLO-T-based overlapping slicing strategy, a Top-K pseudo-label selection mechanism, and a Kullback–Leibler (KL) divergence-based distribution alignment constraint. The slicing strategy enhances small-object representation without modifying the detector backbone, while the Top-K and KL modules improve pseudo-label reliability and semantic consistency during training. Under the 20% labeled setting, SSFPDet achieves an mAP@0.5:0.95 of 46.6, outperforming the baseline by 0.7 points. Notably, small-object detection performance (AP_S) improves by 6.6 percentage points. Ablation studies confirm the complementary contributions of spatial slicing and semantic alignment. Overall, SSFPDet provides a practical and scalable solution for high-resolution forestry pest monitoring under limited supervision.
- New
- Research Article
- 10.1007/s13278-026-01591-7
- Mar 10, 2026
- Social Network Analysis and Mining
- Ridwan Amure + 1 more
Abstract Narratives on social media emerge through the interaction of content, users, and platform-specific structures, yet identifying cohesive narrative formations across platforms remains challenging. This paper introduces Weighted Contextual Focal Structures Analysis (W–CFSA), a graph-based method for discovering structurally embedded narrative groups in multiplex networks. We construct a three-layer multiplex network integrating user–user interactions, user–narrative associations, and narrative–narrative similarity, with edge weights derived from semantic alignment. The method is evaluated on over 70,000 cross-platform data collected from YouTube, TikTok, X (formerly Twitter), and Instagram between January and May 2025, comprising posts related to U.S.-China trade and tariff policies. Empirical results show that W–CFSA identifies focal sets with higher internal cohesion, greater internal density, and more substantial impact on global clustering than a size-matched eigenvector-centrality baseline. Qualitative analysis further indicates that the recovered focal sets correspond to distinct narrative architectures rather than simple stance groupings, demonstrating the value of integrating narrative context with network structure for cross-platform analysis.
- New
- Research Article
- 10.3389/fpsyg.2026.1659797
- Mar 9, 2026
- Frontiers in Psychology
- Alice Karbanova + 1 more
Introduction Although the processing of language and music are thought to be related, the semantic interplay of these domains in song remains relatively unexplored. This study investigates how music and lyrics contribute to conceptual meaning-making in song interpretation using a conceptual priming experiment. Methods Fifty participants completed a lexical decision task in which target words were semantically related either to the music or to the lyrics of an ecologically valid song prime. Reaction times were used to infer semantic alignment. Results and Discussion The results showed significantly faster responses to target words associated with the music than to those associated with the lyrics of the prime. This effect remained significant even after controlling for various properties of the primes and targets, which had been assessed by an additional 234 participants in complementary studies prior to the priming experiment. We also found a significant interaction between target type (music- vs. lyrics-related) and the Euclidean distance of valence and arousal between the prime and target: affective distance predicted reaction times only for music-derived targets. Ratings from the complementary studies indicated that music evoked more positive and arousing responses than lyrics, while lyrics appeared to dampen the affective intensity of musical excerpts. Our findings challenge the assumption of tight integration between melody and lyrics in song processing. They suggest that music and language contribute unequally to conceptual interpretation in song, with music playing a more dominant role. These results offer new insights into the construction of multi-modal meanings and the cognitive mechanisms underlying song comprehension.
- New
- Research Article
- 10.1111/tgis.70226
- Mar 8, 2026
- Transactions in GIS
- Kai Ma + 7 more
ABSTRACT The geological reports and maps accumulated during geological surveying and mapping harbor rich expert knowledge and metallogenic clues. However, efficiently integrating and mining structured knowledge from complex multimodal data of polymetallic deposits remains a critical bottleneck in intelligent mineral prediction. To address this, we propose a knowledge graph (KG)‐enhanced multimodal retrieval‐augmented generation (RAG) framework, Geo‐MAG, for geological map understanding. Specifically, the framework first processes textual geological reports and constructs a structured KG. Concurrently, a vision large model parses geological maps to extract metadata, including legends, geological structures, strata, and lithologies. Leveraging this metadata, relevant subgraphs are retrieved from the KG to facilitate text–map semantic alignment and enhance background geological knowledge. Finally, the integrated map information and structured subgraphs of KG are fed into the GPT‐4o to enable deep semantic interpretation. Experimental results demonstrate that integrating the knowledge graph significantly boosts the GPT‐4o's reasoning capability and interpretability in geological map understanding. The model achieves 77.2% accuracy in geological reasoning tasks, outperforming the direct end‐to‐end GPT‐4o interpretation by 53.7% and lightweight schemes on the basis of basic metadata by 37.4%. This work represents a pioneering application of KG and RAG in geological map understanding, highlighting the synergistic advantages of integrating text and maps, and offering a novel perspective on multimodal integration within the geoscience domain.
- New
- Research Article
- 10.3390/info17030254
- Mar 3, 2026
- Information
- Jorge Galán-Mena + 4 more
Scientific collaboration is increasingly needed to address complex research challenges, yet identifying promising partners in the absence of prior co-authorship remains difficult. We present a decision-support pipeline for discovering researchers who have not previously worked together and whose collaboration is unlikely to emerge without deliberate intervention or institutional incentives. The approach leverages document-level semantic representations to estimate proximity between publications, aggregates these similarities at the author level, and surfaces collaboration opportunities that are not evident from the co-authorship graph. To support interpretation by decision makers, a separate LLM module proposes potential joint research directions, which are subsequently annotated with multi-label fields of study. We evaluate the pipeline through an institutional case study, analyzing 7531 publications from 2009 to 2024 using retrospective, temporally shifted windows. While only a small fraction of suggested pairs materialized spontaneously in subsequent periods, the collaborations that do emerge exhibit strong semantic alignment with the computed recommendations (high cosine similarity) and substantial thematic overlap. These results indicate that semantic proximity can act as an early indicator of latent complementarity between researchers without prior ties, supporting intentional institutional mediation and complementing topology-driven approaches that predict links under passive evolution.
- New
- Research Article
- 10.1016/j.neucom.2025.132442
- Mar 1, 2026
- Neurocomputing
- Jianli Zhao + 4 more
Federated intent-aware cross-domain recommendation via semantic alignment and collaborative enhancement
- New
- Research Article
- 10.1016/j.inffus.2025.103717
- Mar 1, 2026
- Information Fusion
- Wei Huang + 10 more
MFA-NRM: A novel framework for multimodal fusion and semantic alignment in visual neural decoding
- New
- Research Article
- 10.1016/j.saa.2025.127289
- Mar 1, 2026
- Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
- Yu Yang + 8 more
Contrastive learning-based fusion of NIR spectroscopy and visual cues for molecular discrimination of Chinese herbal species.
- New
- Research Article
- 10.1016/j.engappai.2026.114058
- Mar 1, 2026
- Engineering Applications of Artificial Intelligence
- Jieyu An + 2 more
Noise-aware Graph Neural Networks for multimodal semantic alignment in social media sentiment analysis
- New
- Research Article
- 10.1038/s41598-025-34084-2
- Feb 28, 2026
- Scientific reports
- Rohit M + 4 more
Accurate sentence similarity estimation is a fundamental requirement in Automated Evaluation Systems (AES), where reliable semantic alignment directly impacts grading fairness and consistency. While transformer-based Sentence Similarity Tools (SSTs) perform effectively on non-negated text, they exhibit notable limitations in modeling the semantic distortions introduced by negation. To overcome this challenge, this paper proposes a novel Negation-Aligned Similarity (NAS) Scorer within a hybrid semantic similarity framework, specifically designed for negation-aware semantic modeling. The proposed method integrates multi-embedding fusion using BERT, SBERT, RoBERTa, DistilBERT, and Word2Vec, followed by BiLSTM-based contextual encoding to capture the sequential dependencies. A custom Negation-Sentence-Similarity Dataset (NSSD) comprising 8575 human-verified sentence pairs across four technical domains is curated. Experimental evaluations on the STS Benchmark dataset demonstrate that the proposed NAS Scorer achieves a F1-score of 0.97 after scale normalization, significantly outperforming strong transformer-based baselines. By explicitly addressing negation-induced semantic shifts, the proposed framework enables richer and more reliable similarity estimation, making it well-suited for deployment in real-world AES.
- New
- Research Article
- 10.61360/bonicetr262019720204
- Feb 26, 2026
- Contemporary Education and Teaching Research
- Hang Dong
At a historical juncture when Artificial Intelligence Generated Content (AIGC) is profoundly reshaping global knowledge production and the landscape of geopolitical analysis, traditional area studies confront a dual crisis: the declining explanatory power of qualitative paradigms and a “knowledge supply crisis” mediated by algorithmic systems. This paper argues that area studies must undergo a paradigmatic transformation—from “linguistic mediation” to “semantic governance,” and from tacit experiential knowledge to the construction of “semantic infrastructure.” By conceptualizing high-quality data annotation as both a confirmation of semantic rights and the building of cognitive infrastructure, the study proposes an integrated framework encompassing ontology construction, semantic alignment, and domain-specific evaluation. Through this engineering-oriented architecture, unstructured regional knowledge can be transformed into structured data assets. Driven by a dual-engine model integrating “United Nations normative corpora” and “Open Source INTelligence (OSINT),” area studies can evolve from post hoc interpretation toward a decision-support paradigm characterized by real-time perception, structured computation, and scenario simulation. Such a transformation will consolidate the epistemic foundations of China’s autonomous knowledge system and safeguard national cognitive sovereignty.
- New
- Research Article
- 10.3390/su18052208
- Feb 25, 2026
- Sustainability
- Xiaolin Li + 3 more
Under the Belt and Road Initiative, whether architectural education effectively supports sustainability-oriented overseas practice remains insufficiently evidenced. Anchored in the Royal Institute of British Architects (RIBA) and the National Architectural Accrediting Board (NAAB) competency frameworks, this study constructs a tripartite analytical framework linking international standards, educational curricula, and overseas job requirements. Based on curriculum texts and 200 overseas job postings from major international recruitment platforms, paragraph-level semantic alignment is quantified using TF-IDF weighting, SBERT-based embeddings, cosine similarity, and clustering analysis. The results indicate a clear structural divergence: while domestic architectural education shows moderate alignment with overseas demand in foundational technical competencies (average similarity 0.58–0.62), it consistently underperforms in sustainability-critical dimensions—including BIM-based collaboration, international standard adaptation, cross-cultural coordination, and professional ethics—with similarity values below 0.45. This misalignment reflects a systemic imbalance between design-centered training and the governance-oriented competency structure required for sustainable overseas projects, providing a quantitative diagnostic basis for reconfiguring sustainability-oriented architectural education.
- New
- Research Article
- 10.31449/inf.v50i7.10368
- Feb 21, 2026
- Informatica
- Chunling Zhang
Text-to-image generation has quickly evolved with diffusion-based generative models that combine semantic conditioning and latent-space denoising, allowing machines to generate high-quality visuals from natural language prompts. Despite these developments, existing diffusion systems still face challenges in prompt clarification accuracy, model adaptability, and computational efficacy, which limit their performance in real-time and resource-limited settings. The research aims to design and optimize an image generation framework based on Stable Diffusion (SD) that improves prompt processing, improves image quality, and enables lightweight fine-tuning. The system utilizes the LAION-Aesthetics v2 4.5 dataset, which contains high-quality text–image pairs suitable for visual generation tasks. Preprocessing involves text cleaning, tokenization, and semantic structuring, utilizing a transformer-based tokenizer to ensure accurate language-to-visual mapping. The architecture integrates Stable Diffusion, Variational Autoencoder (VAE) for latent-space decoding, and Low-Rank Adaptation (LoRA) for efficient fine-tuning with minimal computational cost. Results show that SD-VAE-LoRA achieved a PSNR of 33.7 dB, SSIM of 93 %, FID of 17.8, Inception Score of 36.02, and R-Precision of 90 %, superior to baseline SD and advanced diffusion models such as Latent Diffusion Method (LDM) [24], Menstrual Cycle-Inspired Latent Diffusion Method (MCI-LDM) [24], and Conditional Generative Adversarial Networks, Attention mechanisms, and Contrastive Learning (C-GAN+ATT+CL). The optimized system advances semantic alignment, decreases training time, and preserves image realism, confirming its strength for scalable, adaptive, and high-fidelity image generation applications.
- New
- Research Article
- 10.31449/inf.v50i8.12087
- Feb 21, 2026
- Informatica
- Uce Indahyanti + 2 more
Translating unstructured user feedback into Business Process Model and Notation (BPMN) is challenging due to informal language, contextual ambiguity, and the lack of explicit structural cues. We present FB2BPMN, an end-to-end pipeline that combines natural language processing (NLP), large language models (LLMs), and fuzzy string matching to automatically generate BPMN elements from raw feedback. The pipeline comprises four stages: sentence structuring, fact extraction, role-activity mapping, and fuzzy-based semantic alignment. We evaluate FB2BPMN on 125 annotated feedback instances sampled from academic journal management systems. Using expert-authored BPMN as reference, FB2BPMN attains precision 0.97, recall 0.88, and F1 0.91 on element identification and accuracy 0.85 on process flow construction, outperforming a rule-based baseline. Results indicate strong structural and semantic correspondence, showing that FB2BPMN effectively bridges informal feedback and formal process representations.
- New
- Research Article
- 10.1108/jsit-05-2025-0223
- Feb 20, 2026
- Journal of Systems and Information Technology
- Joshua T Lavoie + 1 more
Purpose Cyber threat intelligence (CTI) and risk management (RM) remain fragmented across tools, formats and governance processes, limiting interoperability and consistent decision-making. This study aims to develop unified risk and intelligence messaging (URIM), a governance-oriented artifact that standardizes risk and intelligence communication across heterogeneous organizational contexts. Design/methodology/approach Following design science research (DSR), the authors elicited requirements from cybersecurity professionals. A pilot study refined the survey instrument, followed by a qualitative survey using purposive and snowball sampling. Thematic analysis informed the URIM artifact specification, which was appraised through a qualitative ex ante expert review, with recommendations registered for subsequent cycles. Findings Three recurring barriers to effective cyber risk governance emerged: fragmented toolchains, limited data interoperability and inconsistent governance practices. Participants highlighted vendor lock-in, incompatible protocols and weak standardization as constraints on intelligence sharing. They supported a vendor-neutral approach combining canonical governance messages, semantic alignment and modular compliance features. Originality/value URIM extends cybersecurity governance research by providing a user-informed, model-level DSR artifact that links CTI and RM through standardized governance messaging and explicit interface specifications. It makes interoperability requirements explicit by defining canonical messages and semantic alignment rules that existing CTI-sharing and RM approaches often leave implicit.
- New
- Research Article
- 10.1109/tip.2026.3663935
- Feb 19, 2026
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Yajing Liu + 5 more
Unsupervised domain adaptive object detection methods enhance model robustness in the target domain without requiring target-domain annotations. Despite notable progress, existing methods face two major challenges: 1) insufficient and inefficient learning of holistic feature consistency due to cumbersome pixel-level style matching and semantic discrepancy elimination between domains as well as the overlooking of their collaborative effect, and 2) unreliable learning of category feature compactness caused by poor-quality target-domain samples, inaccurate pseudo-labels and noisy cross-domain contrast paradigms. To address these challenges, we propose a novel Semantic Consistency and Compactness Learning (SCCL) network. For consistency learning, we introduce a Visual Adaptation-guided Semantic Alignment (VSA) module that achieves style matching through simple feature adaptation and incorporates a novel adversarial-free self-supervised method for feature disentanglement. The collaboration between these two aspects enables sufficient and efficient consistency learning. For reliable compactness learning, we develop a plug-and-play Instance Center-Contrastive (ICC) head that, for the first time, comprehensively addresses all three potential causes of unreliable learning through three integrated innovations, concerning sample pseudo-label quality enhancement, reliable sample storage and updating, and a robust sample contrast paradigm. Besides, the mutual reinforcement effect of VSA and ICC simultaneously enhances feature transferability and discriminability. Extensive experiments across four UDA object detection benchmarks with two baselines show that SCCL achieves superior adaptability and robustness.
- Research Article
- 10.26689/jera.v10i1.13888
- Feb 12, 2026
- Journal of Electronic Research and Application
- Jiakai Zhong
Amidst the intensifying digital economy and global competition, supply chain quality management is evolving from traditional linear models toward networked systems characterized by data-driven and intelligent collaboration. This paper constructs an AI-driven “Supply Chain Quality Collaborative Management” framework through system optimization and artificial intelligence analytical capabilities from a supply chain perspective. The study first analyzes core challenges in supply chain quality collaboration across three dimensions: data fragmentation, standard discrepancies, and mechanism asymmetry. It highlights that traditional static and reactive quality controls struggle to adapt to complex, dynamic supply chain ecosystems. Subsequently, through systematic literature review and theoretical synthesis, the paper elucidates AI’s role in multi-source quality data fusion, semantic alignment, standardized governance, and intelligent incentives. It proposes collaborative optimization pathways based on deep learning, blockchain, and reinforcement learning. Through case studies in the automotive and pharmaceutical industries, the research validates the feasibility of AI in predictive maintenance and cross-linkage collaborative decision-making, demonstrating AI’s ability to significantly enhance the systemic resilience and decision-response capabilities of quality management. This paper innovatively integrates industrial engineering process optimization with cross-organizational governance mechanisms for supply chain quality management, providing a new theoretical framework and practical pathway for intelligent manufacturing and sustainable supply chain development.
- Research Article
- 10.3390/electronics15040784
- Feb 12, 2026
- Electronics
- Qiyi He + 7 more
Connected and Autonomous Vehicles (CAVs) are exposed to increasingly sophisticated cyber threats hidden within high-dimensional, heterogeneous network traffic. A critical bottleneck in existing Intrusion Detection Systems (IDS) is the feature heterogeneity gap: discrete protocol signatures (e.g., flags, services) and continuous traffic statistics (e.g., flow duration, packet rates) reside in disjoint latent spaces. Traditional deep learning approaches typically rely on naive feature concatenation, which fails to capture the intricate, non-linear semantic dependencies between these modalities, leading to suboptimal performance on long-tail, minority attack classes. This paper proposes HCA-IDS, a novel framework centered on Semantics-Aware Cross-Modal Alignment. Unlike heavy-weight models, HCA-IDS adopts a streamlined Multi-Layer Perceptron (MLP) backbone optimized for edge deployment. We introduce a dedicated Multi-Head Cross-Attention mechanism that explicitly utilizes static “Pattern” features to dynamically query and re-weight relevant dynamic “State” behaviors. This architecture forces the model to learn a unified semantic manifold where protocol anomalies are automatically aligned with their corresponding statistical footprints. Empirical assessments on the NSL-KDD and CICIDS2018 datasets, validated through rigorous 5-Fold Cross-Validation, substantiate the robustness of this approach. The model achieves a Macro-F1 score of over 94% on 7 consolidated attack categories, exhibiting exceptional sensitivity to minority attacks (e.g., Web Attacks and Infiltration). Crucially, HCA-IDS is ultra-lightweight, with a model size of approximately 1.00 MB and an inference latency of 0.0037 ms per sample. These results confirm that explicit semantic alignment combined with a lightweight architecture is key to robust, real-time intrusion detection in resource-constrained CAVs.
- Research Article
- 10.3390/app16041857
- Feb 12, 2026
- Applied Sciences
- Yongyang Yin + 5 more
To address the trade-off between parameter scale and generation quality in Vision-Language Models (VLMs), this study proposes a Multi-Feature Dynamic Instruction Tuning (MFDIT) image captioning model based on LLaMA. By integrating CLIP-based global features with SAM-derived local features, the model constructs a multi-level visual representation. Additionally, a Dynamic Prompt Adapter is designed to enable cross-modal semantic alignment with adaptive flexibility. Combined with a Low-Rank Adaptation (LoRA) fine-tuning strategy, the proposed method enhances the model’s capability in describing diverse images while training only 20 million parameters, accounting for merely 0.05% of the total parameter volume. Experimental results demonstrate that the model achieves a CIDEr score of 126.7 on the MSCOCO dataset, surpassing traditional adapter-based approaches by 3.0 points. Moreover, in the MME Benchmark evaluation, the proposed model outperforms the mainstream LLaMA-Adapter V2 by 7.3% and 3.8% in OCR and object counting tasks, respectively. Ablation studies further validate the synergistic effects of multi-feature fusion and dynamic instruction optimization. This research provides an efficient solution for parameter-efficient multimodal model training and potential deployment in resource-constrained environments.
- Research Article
- 10.1371/journal.pone.0342342
- Feb 11, 2026
- PLOS One
- Jia-Qi Wang
In the context of the deep integration of globalization and digitalization, the cross-lingual dissemination of news and public opinion information has become an increasingly significant challenge. This study proposes a novel cross-lingual sentiment analysis framework, CLAS-Net, designed to address the bottlenecks of current public opinion analysis systems in multilingual scenarios. The framework combines the cross-lingual contrastive learning capabilities of XLM-RoBERTa with the precise sentiment feature extraction ability of BiLSTM-Attention, enabling efficient analysis of multilingual public opinion. In monolingual tasks for English and Portuguese, CLAS-Net achieves accuracies of 92% and 89%, respectively, representing a 29 percentage point improvement compared to baseline models. In more challenging multilingual settings, CLAS-Net maintains a high accuracy of 83%, a 29 percentage point improvement over the baseline model. CLAS-Net (Cross-Lingual Alignment Sentiment Network) demonstrates strong adaptability and practical value when processing real-world social media and news data, providing reliable technical support for cross-lingual public opinion monitoring and analysis in the global context.