Articles published on Retrieval Model
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
3554 Search results
Sort by Recency
- Research Article
- 10.1038/s41598-026-38218-y
- Mar 11, 2026
- Scientific reports
- Aya E Fawzy + 3 more
As digital imaging in healthcare grows quickly, dealing with vast medical image data is getting trickier. Content-Based Medical Image Retrieval (CBMIR) systems help with this, but they struggle because of the gap between simple image details and what these images mean in a clinical setting. This paper presents a new approach using deep learning for CBMIR that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Explainable AI (XAI). Using the Breast Ultrasound Image (BUSI) dataset for training, this hybrid model classifies images and finds the relevant results based on predictions. It reaches a classification accuracy of 99.24% and performs well in retrieval tasks.
- Research Article
- 10.1038/s41562-026-02416-5
- Mar 2, 2026
- Nature human behaviour
- Thomas M Biba + 6 more
Why do some experiences endure in memory better than others? Here we explore the possibility that learning fluctuates rhythmically several times per second, with fortuitously timed experiences being more memorable. Although such fleeting opportunities for encoding would evade our awareness, they are predicted by a prominent model describing how theta rhythms in the brain coordinate memory-the Separate Phases for Encoding and Retrieval (SPEAR) model. In a preregistered study, we adapted a dense sampling approach to reconstruct the millisecond time course of memory encoding in n = 125 participants. We found that memory encoding fluctuated at a theta rhythm (3-10 Hz), that these rhythms were not a by-product of rhythmic attention and that-like theta rhythms in the brain-memory rhythms were modulated by putative markers of acetylcholine. Our findings provide behavioural evidence consistent with the SPEAR model of episodic memory.
- Research Article
- 10.1016/j.oceaneng.2025.124045
- Mar 1, 2026
- Ocean Engineering
- Wei Zhang + 5 more
Development, pre-training, and fine-tuning of the CoordConv-Unet model for significant wave height retrieval from Himawari-8 satellite imagery
- Research Article
- 10.1016/j.ejrs.2026.01.003
- Mar 1, 2026
- The Egyptian Journal of Remote Sensing and Space Sciences
- Jinshan Zhu + 6 more
AN MLPEL machine learning model for bathymetry retrieval based on ensemble learning
- Research Article
- 10.1016/j.watres.2025.125274
- Mar 1, 2026
- Water research
- Min Cui + 5 more
Summer patterns of global lake total suspended solids under climate-hydrology-topography forcing.
- Research Article
- 10.1037/xlm0001593
- Feb 26, 2026
- Journal of experimental psychology. Learning, memory, and cognition
- Joanne Eaves + 2 more
Fluency with multiplication facts provides a strong foundation for future mathematics learning. Models of number fact storage and retrieval suggest that multiplication facts are stored in an associative network whereby neighboring facts (e.g., 6 × 7 and 6 × 8) can interfere with one another. While existing theoretical models account for the skilled retrieval of number facts, an understanding of how interference develops during learning, and hence the need for inhibitory control during learning, is poor. In a preregistered study, we tracked adults as they learned a set of new multiplication facts, and we monitored the interference between facts throughout their learning. Our findings are the first to show that interference emerges early in the learning process as individuals acquire knowledge of new facts. Moreover, interference does not decline as learning continues, even when retrieval accuracy is high and individuals continue to practice facts that they already know. These findings are consistent with theoretical assumptions and models of multiplication fact networks proposed by Ashcraft (1987), Campbell (1995), and De Visscher et al. (2016) but do not support the model proposed by Verguts and Fias (2005). (PsycInfo Database Record (c) 2026 APA, all rights reserved).
- Research Article
- 10.1093/mnras/stag384
- Feb 25, 2026
- Monthly Notices of the Royal Astronomical Society
- Alastair B Claringbold + 7 more
Abstract We present the low-resolution optical transmission spectrum of the inflated hot Saturn HAT-P-44b. The planet is a close sibling in radius (1.24 RJup), temperature (1100 K), and mass (0.35 MJup) to the exceedingly well-characterized WASP-39b. Using the ACAM instrument on the William Herschel Telescope (WHT), we obtain a transmission spectrum with sub-scale height precision of 246 ppm, with a wavelength range of 495 – 874 nm and a 20 nm resolution, despite a relatively faint host star (Vmag = 13.2). We detect absorption due to sodium with 3.9σ confidence. Atmospheric retrieval of the transmission spectrum also reveals evidence for H2O absorption and Rayleigh scattering from H2 gas consistent with a cool 800 K atmosphere and a super-solar metallicity of 7$^{+16}_{-5}\times$solar. Comparison of retrieval models disfavour the inclusion of a super-Rayleigh scattering slope or high-altitude clouds (at <1 mbar) while being agnostic towards the presence of mid-altitude clouds. Our transmission spectrum of HAT-P-44b shows strong similarity to that of its sibling WASP-39b. This is the tenth planet in the LRG-BEASTS (Low-Resolution Ground-Based Exoplanet Atmosphere Survey using Transmission Spectroscopy) survey.
- Research Article
- 10.3390/rs18050671
- Feb 24, 2026
- Remote Sensing
- Xianfeng Hu + 9 more
Accurate and timely monitoring of soil salinity is essential for the sustainable management and remediation of coastal salinization. This study utilized a UAV-based remote sensing platform to collect multispectral imagery and concurrent in situ soil salinity samples from an experimental zone within the Yellow River Delta National Nature Reserve in July 2024. We constructed multiple spectral indices and employed advanced feature selection methods—namely VIP, MultiSURF, and PSO-SFLA—to identify the most informative index combination. We established a soil salinity retrieval model utilizing a stacking ensemble framework. This architecture integrated TabPFN, SVM, and Ridge regression as the base learners, while employing XGBoost as the meta-learner to synthesize the final predictions. Model interpretability was assessed using SHAP (SHapley Additive explanations) values, while predictive performance was evaluated using the coefficient of determination (R2), Standardized Root Mean Square Error (SRMSE), and the Ratio of Performance to Deviation (RPD). Results indicate that the stacking model, when coupled with PSO-SFLA for feature selection, outperformed all other model configurations. It achieved the highest prediction accuracy on the test set, with an R2 of 0.754, SRMSE of 0.310, and RPD of 1.941. The resulting soil salinity distribution map exhibited a high degree of spatial agreement with the ground-truth survey data. This study demonstrates that leveraging a stacking algorithm with UAV multispectral data provides an accurate and reliable method for monitoring soil salinity in coastal wetlands, offering valuable technical support for effective soil salinization management.
- Research Article
- 10.1145/3789254
- Feb 23, 2026
- ACM Transactions on Knowledge Discovery from Data
- Thanh Cong Tran + 2 more
Legal case entailment embodies a fundamental principle of the legal system, wherein the verdict of historical cases functions as a guiding precedent for subsequent cases sharing analogous factual circumstances. Due to the intricate nature of legal case documents, identifying entailment between legal cases requires considerable time and effort, necessitating a thorough understanding and specialized expertise in legal interpretation and analysis. To accelerate the process of legal case entailment, in this paper, we conceptualize this task as a document retrieval problem and propose a two-stage framework focused on entailment information retrieval. Within this framework, we develop a cost-efficient system that utilizes advanced language models for legal case entailment. In the first stage, we present the established ColBERT document retrieval model, augmented with a sparse keyword alignment strategy utilizing the Unbalanced Optimal Transport framework. Our study illustrates that by focusing on the interaction of contextually and semantically similar keyword pairs between the query and the document, the proposed alignment method improves the retrieval capability of ColBERT in the legal domain. For the second stage, we employ a fine-tuned MonoT5 document ranking model to refine the retrieval results and predict entailment instances. Extensive evaluation demonstrates a significant performance improvement of the proposed system compared to previous methods. As an additional study, we benchmark state-of-the-art open-source LLMs in legal case entailment to reveal their performance and potential applications. Our findings indicate that while LLMs exhibit sensitivity to prompt formulation, they demonstrate promising zero-shot performance in legal entailment scenarios. To encourage further AI development in the legal domain, we provide the code necessary to reproduce our results.
- Research Article
- 10.3389/flang.2026.1721326
- Feb 23, 2026
- Frontiers in Language Sciences
- Maryam Meghdadi + 2 more
In 2025, psycholinguistic research has the benefit of large, high-quality datasets of human behavior, and massively-scalable metrics for variables of interest like frequency and association. This means we have more data than ever before to shed light on classic language processing phenomena like associative priming. But in order to build and test rigorous theories against this data, we also need computational modeling tools that can simulate cognitive mechanisms and generate quantitative predictions at the same scale. In this paper, we assemble one such case, adapting the ACT-R cognitive modeling framework to make use of association metrics derived from language model embeddings, in service of a scalable model of associative priming in the Lexical Decision Task. ACT-R implements a model of memory retrieval that can use itemwise predictors like frequency and association to predict task response times (RTs), via interpretable and meaningfully-parameterized components like spreading activation. But currently, ACT-R's spreading activation calculations rely on manually-coded similarity scores, which are labor-intensive and prone to inaccuracies, particularly for large vocabularies. In this study, we replace these hand-coded associations with cosine similarity scores derived from Word2Vec and BERT embeddings, thereby improving both scalability and predictive accuracy while retaining ACT-R's interpretability. We compare various versions of our model against observed human RTs from the Semantic Priming Project dataset, observing impressive item-wise prediction accuracy, and achieving the strongest alignment with a model where spreading activation is penalized via a scalable approximation of the classic “fan effect.” These findings provide a proof of concept for integrating embedding-based representations into algorithmic-level models of language processing. More than an insight into models of priming, we see this as a first step toward scalable and specific models of more complex phenomena.
- Research Article
- 10.1145/3779428
- Feb 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Abbas Saleminezhad + 3 more
Ad hoc retrieval, a cornerstone task in Information Retrieval (IR) , aims to rank documents in response to a user’s query, often without prior knowledge of the user’s specific information need. While transformer-based neural rankers have achieved state-of-the-art performance in ad hoc retrieval, their effectiveness varies significantly across queries. Certain queries—commonly referred to as hard queries —remain particularly challenging, highlighting critical gaps in retrieval models. Identifying these hard queries is essential for improving retrieval systems, motivating the task of Query Performance Prediction (QPP) , which aims to estimate the effectiveness of a query without requiring access to relevance judgments. In this article, we propose Context-aware Query Performance Prediction ( CA-QPP ) , a novel post-retrieval QPP method, which builds on the foundations of perturbation-based QPP methods that hypothesize a relationship between query sensitivity to small perturbations and query retrieval effectiveness. Building on this foundation, our approach exposes the given query to perturbations by constructing two query variations: an effective variation emphasizing terms that enhance retrieval and an ineffective variation accentuating terms that hinder it. By contrasting the retrieval outcomes of these variations using a cross-encoder model, CA-QPP captures the interplay of term contributions and predicts the performance for the given query. We evaluate CA-QPP on the widely used MS MARCO datasets and their associated query sets, including TREC DL 2019 , TREC DL 2020 , DL-Hard , TREC DL 2021 , and TREC DL 2022 , which feature extensive human-labeled relevance judgments. Our experiments demonstrate that CA-QPP consistently outperforms traditional and neural-based QPP baselines across standard correlation metrics, including Pearson’s \(\rho\) , Kendall’s \(\tau\) , and Spearman’s \(\rho\) . Through a detailed case study, we further illustrate the mechanics of CA-QPP and provide empirical evidence for its ability to model the contextual impact of individual query terms, making it a robust framework for query performance prediction.
- Research Article
1
- 10.1145/3777445
- Feb 18, 2026
- ACM Transactions on the Web
- Mohammad Bahrani + 6 more
Traditional information retrieval (IR) models, such as keyword-based and vector-based techniques, have long been used in centralized systems. However, the Web’s re-decentralization, with its focus on data ownership and privacy, calls for a re-evaluation of these methods in these settings. While standards for decentralized search enhance privacy to some extent, they also introduce computational overhead, black-box decision-making, and infrastructure complexity. Despite these challenges, traditional IR techniques remain largely unexplored in such environments. This article presents an innovative application of traditional IR models in the decentralized Web by adapting them for Personal Online Data Stores (PODs), where search parties have varying access rights. We explore their role in source selection, document ranking, and result merging, extending them to meet decentralized search demands. Using Solid PODs and a synthetic medical dataset, we evaluate these models in a privacy-sensitive environment. Our findings demonstrate that extended IR methods provide an effective balance of performance, interpretability, and efficiency. These approaches hold strong potential as privacy-preserving alternatives for decentralized search on a re-decentralized Web. Notably, our top-performing model achieved competitive results in top-item retrieval compared to centralized search systems, maintaining high relevance scores under both limited and full data access conditions.
- Research Article
- 10.1371/journal.pone.0342458
- Feb 17, 2026
- PloS one
- Jessica M V Mcmaster + 3 more
Engagement in a variety of lifestyle activities, such as intellectual stimulation, social interaction, and physical exercise, is thought to be a key contributor to cognitive reserve, helping the brain compensate for age-related or pathological changes. An open question is whether restrictions on lifestyle activities, even if relatively brief, might have detrimental effects on cognition. The COVID-19 pandemic led to unprecedented restrictions on the kinds of lifestyle activities that have been shown to be protective against age-related cognitive decline. In the present study, we captured changes in lifestyle and memory of older adults across the pandemic. Long-term memory was assessed using a task which allows for the estimation of both retrieval success and memory precision, the latter being particularly sensitive to age-related changes. Memory was assessed before the pandemic in person, and during the pandemic using an online version of the task. Experiment 1 first verified that younger adults' performance did not significantly differ between testing environments, validating pre- and post-pandemic comparison in older adults. Experiment 2 then demonstrated that while substantial declines in lifestyle engagement were observed during the pandemic in older adults, there was no significant correlation between these lifestyle changes and memory performance overall. However, when modelling retrieval success, lifestyle effects varied with dementia risk, consistent with cognitive reserve theory, as well as varying with depression. These findings highlight how different memory features are impacted by factors such as lifestyle, and support the proposal that heightened dementia risk may increase susceptibility to the impact of lifestyle changes.
- Research Article
- 10.62802/dghv1j65
- Feb 13, 2026
- Next Generation Journal for The Young Researchers
- Lara Alize Ergül
The exponential growth of digital information has led to an unprecedented surge in unstructured data originating from sources such as social media, multimedia platforms, sensor networks, and enterprise systems. Traditional relational databases and structured storage frameworks are increasingly inadequate for handling the scale, heterogeneity, and velocity of such data. This paper examines efficient storage and retrieval models for large-scale unstructured data analytics, focusing on distributed architectures, indexing strategies, and intelligent retrieval mechanisms. By synthesizing advances in cloud storage systems, NoSQL databases, vector search techniques, and machine learning–assisted data organization, the study evaluates how modern data infrastructures can optimize performance, scalability, and accessibility. The findings highlight the importance of hybrid storage paradigms and semantic retrieval frameworks in enabling rapid, accurate analysis of massive unstructured datasets, thereby supporting data-driven decision-making across industries.
- Research Article
- 10.1371/journal.pone.0342895
- Feb 13, 2026
- PloS one
- Achilleas Livieratos + 10 more
Network meta-analysis (NMA) can compare several interventions at once by combining head-to-head and indirect trial evidence. However, identifying, extracting, and modelling these often takes months, delaying updates in many therapeutic areas. To develop and validate MetaMind, an end-to-end, transformer-driven framework that automates NMA processes-including study retrieval, structured data extraction, and meta-analysis execution-while minimizing human input. MetaMind integrates Promptriever, a fine-tuned retrieval model, to semantically retrieve high-impact clinical trials from PubMed; a multi-agent LLM architecture--Mixture of Agents (MoA)-- pipeline to extract PICO-structured (Population, Intervention, Comparison, Outcome) endpoints; and GPT-4o-generated Python and R scripts to perform Bayesian random-effects NMA and other NMA designs within a unified workflow. Validation was conducted by comparing MetaMind's outputs against manually performed NMAs in ulcerative colitis (UC) and Crohn's disease (CD). Promptriever outperformed baseline SentenceTransformer with higher similarity scores (0.7403 vs. 0.7049 for UC; 0.7142 vs. 0.7049 for CD) and narrower relevance ranges. Promptriever performance achieved 82.1% recall, 91.1% precision and an F1 score of 86.4% when compared to a previously published NMA. MetaMind achieved 100% accuracy on a limited set of remission endpoints regarding PICO (Population, Intervention, Comparator, Outcome) element extraction and produced comparative effect estimates and credible intervals closely matching manual analyses. In our validation studies, MetaMind reduced the end-to-end NMA process to less than a week, compared with the several months typically needed for manual workflows, while preserving statistical rigor. This suggests its potential for future scaling of evidence synthesis to additional therapeutic areas.
- Research Article
- 10.1038/s41929-026-01478-y
- Feb 12, 2026
- Nature Catalysis
- Yong Liu + 9 more
A geometric foundation model for enzyme retrieval with evolutionary insights
- Research Article
- 10.1142/s021800142550034x
- Feb 7, 2026
- International Journal of Pattern Recognition and Artificial Intelligence
- Xiaochuan Pu + 1 more
Cybersecurity threats are becoming increasingly complex and diverse. Cultivating applied professionals with practical skills has become a key task for higher education. This paper proposes an innovative talent development system, centered on localized Retrieval-Augmented Generation (RAG) technology and integrating multiple technologies to promote the deep integration of technology and teaching. First, a knowledge integration and update mechanism based on localized RAG is constructed to automatically integrate authoritative vulnerability libraries such as NIST NVD, CNVD, and CWE. Machine learning techniques are applied for data cleansing and feature extraction to ensure the continuous acquisition of timely and accurate knowledge, laying a solid foundation for teaching and case generation. Second, personalized case generation technology is applied. By combining pattern recognition and machine learning algorithms to analyze student competency models, learning history, and the latest threat intelligence, Large Language Models (LLMs) are used to dynamically generate targeted attack and defense scenario descriptions, reproduction steps, and detection/defense solutions based on deep learning methods. This system meets differentiated learning needs and effectively improves students’ ability to cope with real-world threat environments. Furthermore, a natural language interactive teaching support system is designed. Relying on knowledge base engines (such as AnythingLLM), it realizes multi-format document analysis (including PDF, Markdown, Word, etc.) and efficient ingestion and vectorized storage, and combines with the Ollama artificial intelligence big model for intelligent retrieval and text generation to enhance the comprehensibility and interactivity of teaching content. In knowledge base management, efficient document analysis, pattern recognition and vectorization technologies are used to ensure storage and retrieval efficiency, and private intelligent cloud solutions (such as Infortres) are used to achieve secure remote access to local data and artificial intelligence services to meet compliance requirements and provide convenient support. Practical verification shows that the system significantly improves students’ offensive and defensive practical ability and problem-solving ability, and provides teachers with efficient and flexible teaching methods. In summary, the system realizes the deep feedback of technology and education, and opens up a new path for the cultivation of application-oriented talents in network security. In the future, we will optimize the case generation strategy, expand the scope of the knowledge base, and explore more artificial intelligence teaching applications (such as adaptive learning path recommendation) to continuously improve educational effectiveness.
- Research Article
- 10.1186/s13059-026-03966-7
- Feb 7, 2026
- Genome biology
- Yongxin Ji + 4 more
Plasmids play a pivotal role in the emergence of multidrug-resistant and pathogenic bacteria, posing significant clinical challenges. However, the rapidly growing number of unannotated plasmids necessitates comprehensive characterization of their diverse properties. Here, we present PlasRAG, a tool that integrates multi-faceted property characterization of query plasmids and plasmid DNA retrieval based on textual queries. PlasRAG employs a bidirectional multi-modal information retrieval model that aligns DNA sequences with textual data, effectively overcoming the limitations of traditional approaches. Rigorous experiments demonstrate that PlasRAG delivers robust performance and enhanced analytical capabilities, underscoring the effectiveness of its architectural design.
- Research Article
- 10.1145/3796229
- Feb 6, 2026
- ACM Transactions on Asian and Low-Resource Language Information Processing
- M'Hamed Amine Hatem + 2 more
Transformer-based models have revolutionized information retrieval, achieving state-of-the-art performance in document retrieval and ranking. For high-resource languages like English, an abundance of high-quality labeled datasets has facilitated the development of powerful models. However, developing powerful models for low-resource languages such as Arabic is challenging due to the scarcity of labeled data. While using translated English datasets can be considered to overcome the lack of labeled data, translated datasets have inherent information loss and inconsistencies introduced during the translation process. As a result, models fine-tuned on translated datasets typically underperform relative to their English counterparts. To address this issue, we explore the potential of transferring expertise from high-resource models to low-resource models. In particular, we investigate whether knowledge learned by English retrieval and reranking models can be effectively transferred to Arabic models via knowledge distillation. Our results demonstrate that knowledge distillation significantly improves the performance of Arabic information retrieval. Our models, fine-tuned using knowledge distillation on the mMARCO Arabic passage-ranking dataset, outperform state-of-the-art retrieval and reranker models. Specifically, our cross-encoder achieves an MRR@10 of 0.254, representing an 8% relative improvement over the previous best cross-encoder, mT5. In terms of recall, our bi-encoder achieves an R@1000 of 0.799, surpassing the late-interaction model mColBERT (R@1000 = 0.749, +6.7%) and the baseline BM25 (R@1000 = 0.637, +25%). Furthermore, by leveraging knowledge distillation with soft labels generated by an ensemble of IR models, we manage to achieve comparable or higher performance without requiring extensive manual annotation. This approach offers an effective mechanism for automatic annotation and pseudo-labeling in low-resource language scenarios.
- Research Article
- 10.1109/tvcg.2025.3631434
- Feb 1, 2026
- IEEE transactions on visualization and computer graphics
- Ben Fei + 4 more
With the growing demand for real-world 3-D understanding, learning effective representations of 3-D data has become increasingly important for tasks such as shape classification, model retrieval, scene reconstruction, and point cloud completion. Although previous work has explored self-supervised learning within individual modalities (e.g., point clouds or images), the potential of multi-modal supervision remains largely underexplored due to the lack of aligned and scalable training signals. In this work, we present DR-Point, a tri-modal pre-training framework that jointly learns from RGB images, depth maps, and 3-D point clouds to build a unified embedding space across modalities. By enforcing cross-modal consistency among RGB-depth-point triplets, DR-Point achieves effective 2-D-3-D feature alignment without manual annotations. A differentiable rendering module further enhances geometric fidelity by synthesizing depth cues and refining structural details in reconstructed point clouds. Extensive experiments on benchmarks demonstrate that DR-Point consistently outperforms state-of-the-art self-supervised methods on 3-D classification, segmentation, and completion. These results highlight the advantages of multi-modal pre-training for unified 3-D understanding and its potential to benefit a wide range of vision and graphics applications.