Published in last 50 years
Articles published on Interpretation Tasks
- New
- Research Article
- 10.1161/circ.152.suppl_3.4367224
- Nov 4, 2025
- Circulation
- Shawn Wahi + 5 more
Background: Indications for cardiac magnetic resonance imaging (CMR) are often stored in heterogenous, unstructured reports. Manual adjudication of indications is time-consuming and requires domain expertise. Recent large language models (LLMs) have shown promise in complex clinical interpretation and categorization tasks. No prior study has systematically evaluated the ability of state-of-the-art (SOTA) LLMs to extract indications from raw CMR reports. Research question: How well do SOTA open-source and commercial LLMs adjudicate clinical indications from real-world CMR reports? Methods: We analyzed 486 CMR reports from a large academic center. Reports were de-identified using the Stanford-Penn-MIDRC deidentification tool, and ground-truth indications were annotated by a physician expert. 18 LLMs varying in accessibility (8 open-source, 10 commercial), parameter size (4 to 70 billion), and training corpus (general vs medical) were evaluated. For each report, LLMs were instructed to extract the top two possible indications (correct if either matched the ground-truth indication)—reflecting the fact that real-world indications can fall into more than 1 category—from ten possible categories: oncologic therapy toxicity, cardiomyopathy/elevated troponin, chest pain/dyspnea, arrythmia/abnormal ECG, cardiac mass/metastasis, thrombus, structural evaluation, pericarditis, risk stratification, or viability evaluation (ischemic). Results: Higher-cost commercial models (Spearman’s rank r = 0.683, p = 0.03) and larger-parameter open-source models ( r = 0.307) exhibited better adjudication ability, Fig 1A, 1B . The best performing commercial LLMs performed markedly better than the top open-source LLMs (90% vs ~78% accuracy [acc]), Fig 2 . Grok 3 (91% acc, 0.94 F1-score) and OpenAI o3 (90% acc, 0.93 F1) were the best models overall, and Gemma 3 27B was the best open-source LLM (80% acc, 0.86 F1), Fig 2 . Reasoning models performed comparably to non-reasoning models, with Grok 3 mini having the best relative cost-vs-performance, Fig 1A, 2 . Interestingly, medical LLMs performed worse than their generally pretrained counterparts (e.g., MedGemma 27B vs Gemma 3 27B), suggesting domain-specific pretraining may negatively affect adjudication ability, Fig 2 . Conclusion: Open-source and commercial LLMs demonstrate promise in automated, accurate extraction of indications from CMR reports. Our findings help clinician-researchers decide between LLMs for use-cases involving CMR reports.
- New
- Research Article
- 10.3390/app152011087
- Oct 16, 2025
- Applied Sciences
- Binpeng Yan + 3 more
Seismic facies recognition constitutes a fundamental task in seismic data interpretation, playing an essential role in characterizing subsurface geological structures, sedimentary environments, and hydrocarbon reservoir distributions. Conventional approaches primarily depend on expert interpretation, which often introduces substantial subjectivity and operational inefficiency. Although deep learning-based methods have been introduced, most rely solely on unimodal data—namely, seismic images—and encounter challenges such as limited annotated samples and inadequate generalization capability. To overcome these limitations, this study proposes a multimodal seismic facies recognition framework named GAT-UKAN, which integrates a U-shaped Kolmogorov–Arnold Network (U-KAN) with a Graph Attention Network (GAT). This model is designed to accept dual-modality inputs. By fusing visual features with knowledge embeddings at intermediate network layers, the model achieves knowledge-guided feature refinement. This approach effectively mitigates issues related to limited samples and poor generalization inherent in single-modality frameworks. Experiments were conducted on the F3 block dataset from the North Sea. A knowledge graph comprising 47 entities and 12 relation types was constructed to incorporate expert knowledge. The results indicate that GAT-UKAN achieved a Pixel Accuracy of 89.7% and a Mean Intersection over Union of 70.6%, surpassing the performance of both U-Net and U-KAN. Furthermore, the model was transferred to the Parihaka field in New Zealand via transfer learning. After fine-tuning, the predictions exhibited strong alignment with seismic profiles, demonstrating the model’s robustness under complex geological conditions. Although the proposed model demonstrates excellent performance in accuracy and robustness, it has so far been validated only on 2D seismic profiles. Its capability to characterize continuous 3D geological features therefore remains limited.
- New
- Research Article
- 10.1190/geo-2024-0527
- Oct 13, 2025
- GEOPHYSICS
- Tong Zhou + 3 more
Seismic data acquisition is inherently constrained by budget and operational challenges, which often result in datasets that fails to adequately capture geological complexity. These limitations, coupled with the subjectivity from interpreters, lead to inconsistencies in seismic interpretation. While machine learning (ML) has shown potential for automating interpretation tasks, its generalization across diverse geological settings remains limited by the scarcity of high-fidelity training datasets. Effective seismic data augmentation is critical to addressing this gap by enhancing dataset diversity and realism. However, traditional augmentation methods often fail to replicate key geological features, such as stochastic scattering effects caused by small-scale heterogeneities, limiting their real-world applicability. We propose a novel seismic data augmentation technique that integrates Sequential Gaussian Simulation (SGS) with synthetic multiples. This approach generates realistic and diverse datasets by accurately modeling stochastic scattering while preserving critical geological structures. The augmented datasets improve both manual interpretation and ML model performance in tasks such as fault detection, stratigraphy truncation, and horizon picking. By enhancing data fidelity and variability, our method offers a robust solution to challenges in seismic interpretation, advancing the accuracy and efficiency of workflows across complex geological settings.
- Research Article
- 10.7592/ejhr.2025.13.2.1043
- Oct 7, 2025
- The European Journal of Humour Research
- Christopher T Burris + 2 more
Previous research has demonstrated that disrespect sensitivity plus anger rumination (DSAR) predicts outcomes congruent with sadistic motivation (such as positive affect in response to target harm) in pranking contexts. Because “successful” pranksters often appear giddy rather than overtly hostile, we conducted three studies involving 990 Canadian undergraduates based on the idea that DSAR-related hostility could be operating outside of awareness. When controlling for overlap among humour styles, DSAR predicted greater self-reported use of self-defeating but not aggressive humour (Study 1). Higher DSAR pranksters/observers (but not victims) perceived more happiness than anger in an art interpretation task by default but more anger than happiness when pranks were salient (Study 2). Contrary to their assertions, higher DSAR scorers’ word fragment and projective test responses suggested implicit hostile/dominant tendencies but inhibition of overt aggression in the neutral condition that shifted to disinhibition and mirthless interpersonal detachment when pranks were salient (Study 3). Thus, at least among those most at risk for manifesting sadistic motivation, latent hostility may be overlooked amidst prank-related celebrations. This apparent implicit/overt discrepancy should be considered when designing interventions for minimizing the occurrence of sadistically motivated harm.
- Research Article
- 10.1002/pra2.1335
- Oct 1, 2025
- Proceedings of the Association for Information Science and Technology
- Hannah Moutran + 5 more
ABSTRACTOur research explores whether Large Language Models (LLMs) can offer a solution for efficient development of detailed, rich metadata for digitized collections. We tested seven widely available LLMs in four common metadata tasks on a selection of pages from the Southern Architect and Building News (1882–1932): assigning subject headings; creating short content summaries; extracting named entities; and writing transcriptions. We evaluated the quality of the outputs, the cost, and the time efficiency of using LLMs for metadata workflows by developing a metadata quality rubric and scoring schematic to ground our results. Analysis suggests that models can perform interpretive metadata tasks well, but lack the accuracy needed for assigning terms from controlled vocabularies. With careful implementation, thorough testing, and well‐structured workflow design, these models can be applied to significantly enhance metadata for digitized collections.
- Research Article
- 10.1002/pra2.1528
- Oct 1, 2025
- Proceedings of the Association for Information Science and Technology
- Runsheng Zhang + 1 more
ABSTRACTThis study investigates professional designers' integration of AI tools within workflows across five key stages: Discovery, Ideation, Development, Refinement, and Finalization. Semi‐structured interviews with 14 designers reveal varied AI engagement, shaped by creative intent, identity, and AI trust. A heatmap illustrates AI supporting efficiency and creative tasks (such as ideation, content generation, refinement), while interpretive tasks requiring contextual judgment remain under human control. Designers viewed AI as an assistive co‐creator accelerating iteration and automating labor, rather than replacing core creativity. Offloading procedural tasks and early ideation to AI enabled deeper focus on conceptual and critical decisions. Designers used creative agency to reconfigure workflows, fostering reflective, conceptual work. Findings suggest further exploration of human‐AI interaction, on how designers balance human‐centered values and creative performance, maintaining agency and ethics.
- Research Article
- 10.31185/lark.4560
- Oct 1, 2025
- lark
- Assistant Teacher Lamya Rasheed Al-Ali
Novice interpreters face significant challenges in mastering Consecutive Interpreting components due to the multiple difficulties they encounter during training. They must first learn to understand spoken messages before practicing encoding and taking notes skills and simultaneously decoding and converting those notes orally to interpret spoken messages. This study investigates the key challenges faced by student interpreters in the consecutive interpreting course by examining four basic elements, including comprehension, encoding notes, decoding written notes, and providing final delivery. This study included a sample of students from the Translation Department, where data was obtained from 409 students through a structured questionnaire using the Likert scale rating system. The training of student interpreters faces significant obstacles in all areas, with note-taking and comprehension having become major obstacles to academic progress. Poor comprehension skills cause additional obstacles at all different stages of interpretation. The research emphasizes the need for specialized teaching methods to be implemented to improve interpreter training by managing targeted challenges. The findings of the research contribute to translation studies by providing concrete pedagogical strategies for both teachers and trainees and even curriculum developers who seek to enhance students' abilities to deal with consecutive interpretation tasks.
- Research Article
- 10.1016/j.ridd.2025.105104
- Oct 1, 2025
- Research in developmental disabilities
- Agnieszka Maryniak + 1 more
Interpretation of biological motion with young with cerebral palsy.
- Research Article
- 10.1177/10944281251377154
- Sep 30, 2025
- Organizational Research Methods
- Duc Cuong Nguyen + 1 more
Researchers, engineers, and entrepreneurs are enthusiastically exploring and promoting ways to apply generative artificial intelligence (GenAI) tools to qualitative data analysis. From promises of automated coding and thematic analysis to functioning as a virtual research assistant that supports researchers in diverse interpretive and analytical tasks, the potential applications of GenAI in qualitative research appear vast. In this paper, we take a step back and ask what sort of technological artifact is GenAI and evaluate whether it is appropriate for qualitative data analysis. We provide an accessible, technologically informed analysis of GenAI, specifically large language models (LLMs), and put to the test the claimed transformative potential of using GenAI in qualitative data analysis. Our evaluation illustrates significant shortcomings that, if the technology is adopted uncritically by management researchers, will introduce unacceptable epistemic risks. We explore these epistemic risks and emphasize that the essence of qualitative data analysis lies in the interpretation of meaning, an inherently human capability.
- Research Article
- 10.1016/j.jmir.2025.102087
- Sep 5, 2025
- Journal of medical imaging and radiation sciences
- Bismark Ofori-Manteaw + 3 more
Ghanaian radiographers' perspectives on participating in preliminary image evaluation: A qualitative study.
- Research Article
- 10.29222/ipng.2078-5712.2025.17
- Aug 31, 2025
- Actual Problems of Oil and Gas
- Valery A Iktissanov
Background. Low-permeability reservoirs and reservoirs containing high-viscosity oil are increasingly becoming objects of development due to the deterioration of the reserve structure and a number of negative factors. As a result, the time for the logarithmic derivative of pressure to reach radial flow conditions significantly increases, making it impossible to interpret the pressure buildup curve within an acceptable research time. Objective. To develop and summarize recommendations for the pressure transient analysis in reservoirs with low mobility coefficients. Materials and methods. To solve the set task, known and author-developed methods of interpretation and hydrodynamic research were used. Results. It is shown that the most optimal methods are the following: the drawdown curve instead of pressure buildup curve, the usage of flow models preceding radial flow when interpreting pressure curves for wells with complex geometry and the already traditional interpretation of long-term pressure data using pressure downhole gauges. Conclusions. The proposed methods allow for a reduction in the scheduled shutdown time of well during well test and consequently reduce losses in oil production when solving the problem of determining the filtration parameters of the reservoir and the near-wellbore zone.
- Research Article
- 10.3390/rs17152696
- Aug 4, 2025
- Remote Sensing
- Lingyu Yan + 5 more
Semantic segmentation is one of the key tasks in the intelligent interpretation of remote sensing images with extensive potential applications. However, when ultra-high resolution (UHR) remote sensing images exhibit complex background intersections and significant variations in object sizes, existing multimodal fusion segmentation methods based on convolutional neural networks and Transformers face challenges such as limited receptive fields and high secondary complexity, leading to inadequate global context modeling and multimodal feature representation. Moreover, the lack of accurate boundary detail feature constraints in the final segmentation further limits segmentation accuracy. To address these challenges, we propose a novel boundary-enhanced multilevel multimodal fusion Mamba-Large Strip Convolution network (FMLSNet) for remote sensing image segmentation, which offers the advantages of a global receptive field and efficient linear complexity. Specifically, this paper introduces a new multistage Mamba multimodal fusion framework (FMB) for UHR remote sensing image segmentation. By employing an innovative multimodal scanning mechanism integrated with disentanglement strategies to deepen the fusion process, FMB promotes deep fusion of multimodal features and captures cross-modal contextual information at multiple levels, enabling robust and comprehensive feature integration with enriched global semantic context. Additionally, we propose a Large Strip Spatial Detail (LSSD) extraction module, which adaptively combines multi-directional large strip convolutions to capture more precise and fine-grained boundary features. This enables the network to learn detailed spatial features from shallow layers. A large number of experimental results on challenging remote sensing image datasets show that our method exhibits superior performance over state-of-the-art models.
- Research Article
- 10.52214/cjel.v50i2.14143
- Aug 4, 2025
- Columbia Journal of Environmental Law
- Thomas Mcgarity
Administrative law is emerging as a major focus of the Roberts Court’s efforts to reshape American society. And the primary vehicle for the Court’s transformation of administrative law is the clear statement rule, which provides that federal agencies must point to clear language in their enabling statutes when they address issues that trigger the clear statement rules. In administrative law, those issues include federalism, major questions, and property rights. The demise of the Chevron doctrine is unlikely to disturb this trend, because the normative clear statement rules examined in this article go beyond nondeference to agency statutory interpretation to limit Congress’ power to enact statutes containing broad language empowering agencies to adapt to changing circumstances. This article explores the virtues and disadvantages of aggressive judicial deployment of clear statement rules and concludes that the considerable disadvantages outweigh the modest virtues. The clear statement rules have no textual basis in the Constitution or statute. They are instead built on norms that are putatively located somewhere in the Constitution, but in fact are mirages that appear concrete from a distance, yet disintegrate on close inspection. They are therefore easily manipulable to achieve policy outcomes preferred by the judges applying them. At the same time, they unjustifiably limit Congress’ power to use broad language in statutes to allow implementing agencies to adapt to changing conditions, technological advances, and attempts by regulated entities to circumvent implementing regulations. Furthermore, the high bar for clarity that the Supreme Court has established and the vanishingly small likelihood that Congress will react to a judicial remand with legislation specifically empowering the agency to take the judicially rejected action ensures that clear statement rules are in reality weapons in a broader assault on the administrative state. As such, they are undermining the legitimacy of judicial review. The article briefly probes possible responses to the judicial aggrandizement represented by clear statement rules in administrative law. Among other things, Congress could amend the Administrative Procedure Act to prescribe a standard for judicial review of agency statutory interpretation that precludes judicial use of clear statement rules. Because it is highly unlikely that proponents of protective federal regulation will persuade Congress to act in an era of extreme political polarization, however, the article concludes that the best way for the Court to restore the legitimacy of judicial review is to approach the task of statutory interpretation with greater humility and less enthusiasm for advancing a libertarian agenda.
- Research Article
- Aug 4, 2025
- ArXiv
- Aman Patel + 5 more
Recent advances in self-supervised models for natural language, vision, andprotein sequences have inspired the development of large genomic DNA languagemodels (DNALMs). These models aim to learn generalizable representations ofdiverse DNA elements, potentially enabling various genomic prediction,interpretation and design tasks. Despite their potential, existing benchmarksdo not adequately assess the capabilities of DNALMs on key downstreamapplications involving an important class of non-coding DNA elements criticalfor regulating gene activity. In this study, we introduce DART-Eval, a suite ofrepresentative benchmarks specifically focused on regulatory DNA to evaluatemodel performance across zero-shot, probed, and fine-tuned scenarios againstcontemporary ab initio models as baselines. Our benchmarks target biologicallymeaningful downstream tasks such as functional sequence feature discovery,predicting cell-type specific regulatory activity, and counterfactualprediction of the impacts of genetic variants. We find that current DNALMsexhibit inconsistent performance and do not offer compelling gains overalternative baseline models for most tasks, while requiring significantly morecomputational resources. We discuss potentially promising modeling, datacuration, and evaluation strategies for the next generation of DNALMs. Our codeis available at https://github.com/kundajelab/DART-Eval.
- Research Article
- 10.1016/j.neunet.2025.107496
- Aug 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Runliang Niu + 5 more
Learn to explain transformer via interpretation path by reinforcement learning.
- Research Article
- 10.3390/jtaer20030191
- Aug 1, 2025
- Journal of Theoretical and Applied Electronic Commerce Research
- Simona-Vasilica Oprea + 1 more
In this work, the utility of multimodal vision–language models (VLMs) for visual product understanding in e-commerce is investigated, focusing on two complementary models: ColQwen2 (vidore/colqwen2-v1.0) and ColPali (vidore/colpali-v1.2-hf). These models are integrated into two architectures and evaluated across various product interpretation tasks, including image-grounded question answering, brand recognition and visual retrieval based on natural language prompts. ColQwen2, built on the Qwen2-VL backbone with LoRA-based adapter hot-swapping, demonstrates strong performance, allowing end-to-end image querying and text response synthesis. It excels at identifying attributes such as brand, color or usage based solely on product images and responds fluently to user questions. In contrast, ColPali, which utilizes the PaliGemma backbone, is optimized for explainability. It delivers detailed visual-token alignment maps that reveal how specific regions of an image contribute to retrieval decisions, offering transparency ideal for diagnostics or educational applications. Through comparative experiments using footwear imagery, it is demonstrated that ColQwen2 is highly effective in generating accurate responses to product-related questions, while ColPali provides fine-grained visual explanations that reinforce trust and model accountability.
- Research Article
- 10.3390/rs17152579
- Jul 24, 2025
- Remote Sensing
- Yiyun Luo + 7 more
High-resolution remote sensing imagery plays an essential role in urban management and environmental monitoring, providing detailed insights for applications ranging from land cover mapping to disaster response. Semantic segmentation methods are among the most effective techniques for comprehensive land cover mapping, and they commonly employ ImageNet-based pre-training semantics. However, traditional fine-tuning processes exhibit poor transferability across different downstream tasks and require large amounts of labeled data. To address these challenges, we introduce Denoising Diffusion Probabilistic Models (DDPMs) as a generative pre-training approach for semantic features extraction in remote sensing imagery. We pre-trained a DDPM on extensive unlabeled imagery, obtaining features at multiple noise levels and resolutions. In order to integrate and optimize these features efficiently, we designed a multi-layer perceptron module with residual connections. It performs channel-wise optimization to suppress feature redundancy and refine representations. Additionally, we froze the feature extractor during fine-tuning. This strategy significantly reduces computational consumption and facilitates fast transfer and deployment across various interpretation tasks on homogeneous imagery. Our comprehensive evaluation on the sparsely labeled dataset MiniFrance-S and the fully labeled Gaofen Image Dataset achieved mean intersection over union scores of 42.7% and 66.5%, respectively, outperforming previous works. This demonstrates that our approach effectively reduces reliance on labeled imagery and increases transferability to downstream remote sensing tasks.
- Research Article
- 10.2196/72815
- Jul 8, 2025
- JMIR Formative Research
- August Landerholm
BackgroundQualitative research appraisal is crucial for ensuring credible findings but faces challenges due to human variability. Artificial intelligence (AI) models have the potential to enhance the efficiency and consistency of qualitative research assessments.ObjectiveThis study aims to evaluate the performance of 5 AI models (GPT-3.5, Claude 3.5, Sonar Huge, GPT-4, and Claude 3 Opus) in assessing the quality of qualitative research using 3 standardized tools: Critical Appraisal Skills Programme (CASP), Joanna Briggs Institute (JBI) checklist, and Evaluative Tools for Qualitative Studies (ETQS).MethodsAI-generated assessments of 3 peer-reviewed qualitative papers in health and physical activity–related research were analyzed. The study examined systematic affirmation bias, interrater reliability, and tool-dependent disagreements across the AI models. Sensitivity analysis was conducted to evaluate the impact of excluding specific models on agreement levels.ResultsResults revealed a systematic affirmation bias across all AI models, with “Yes” rates ranging from 75.9% (145/191; Claude 3 Opus) to 85.4% (164/192; Claude 3.5). GPT-4 diverged significantly, showing lower agreement (“Yes”: 115/192, 59.9%) and higher uncertainty (“Cannot tell”: 69/192, 35.9%). Proprietary models (GPT-3.5 and Claude 3.5) demonstrated near-perfect alignment (Cramer V=0.891; P<.001), while open-source models showed greater variability. Interrater reliability varied by assessment tool, with CASP achieving the highest baseline consensus (Krippendorff α=0.653), followed by JBI (α=0.477), and ETQS scoring lowest (α=0.376). Sensitivity analysis revealed that excluding GPT-4 increased CASP agreement by 20% (α=0.784), while removing Sonar Huge improved JBI agreement by 18% (α=0.561). ETQS showed marginal improvements when excluding GPT-4 or Claude 3 Opus (+9%, α=0.409). Tool-dependent disagreements were evident, particularly in ETQS criteria, highlighting AI’s current limitations in contextual interpretation.ConclusionsThe findings demonstrate that AI models exhibit both promise and limitations as evaluators of qualitative research quality. While they enhance efficiency, AI models struggle with reaching consensus in areas requiring nuanced interpretation, particularly for contextual criteria. The study underscores the importance of hybrid frameworks that integrate AI scalability with human oversight, especially for contextual judgment. Future research should prioritize developing AI training protocols that emphasize qualitative epistemology, benchmarking AI performance against expert panels to validate accuracy thresholds, and establishing ethical guidelines for disclosing AI’s role in systematic reviews. As qualitative methodologies evolve alongside AI capabilities, the path forward lies in collaborative human-AI workflows that leverage AI’s efficiency while preserving human expertise for interpretive tasks.
- Research Article
- 10.1093/jge/gxaf091
- Jul 7, 2025
- Journal of Geophysics and Engineering
- Kewen Li + 5 more
Abstract The focus of this study is on the use of a self-supervised deep learning method called Neighbor2Neighbor forS seismic data denoising, aiming to address two key challenges: (i) the high cost of obtaining clean noise-free data, and (ii) the difficulty in balancing denoising effectiveness with the retention of critical information. We introduce a novel regularization technique that reduces noise while preserving essential high-frequency information, such as faults. Built on the ResUNet++ architecture, the model is tailored to the frequency characteristics of seismic data, enhancing its ability to extract relevant features and generalize across different datasets. We quantitatively evaluate the denoising performance by adding noise to synthetic data and simulating noisy field data for qualitative analysis. This approach eliminates the need for expensive clean samples, effectively denoises the data without blurring critical features, and is therefore well-suited for high-quality seismic interpretation tasks.
- Research Article
- 10.1109/jbhi.2025.3550353
- Jul 1, 2025
- IEEE journal of biomedical and health informatics
- Ying Xiang + 4 more
The intricacies of cancer present formidable challenges in achieving effective treatments. Despite extensive research in computational methods for drug response prediction, achieving personalized treatment insights remains challenging. Emerging solutions combine multiple omics data, leveraging graph neural networks to integrate molecular interactions into the reasoning process. However, effectively modeling and harnessing this information, as well as gaining the trust of clinical professionals remain complex. This paper introduces ExplainMIX, a pioneering approach that utilizes directed graph neural networks to predict drug responses with interpretability. ExplainMIX adeptly captures intricate structures and features within directed heterogeneous graphs, leveraging diverse data modalities such as genomics, proteomics, and metabolomics. ExplainMIX goes beyond prediction by generating transparent and interpretable explanations. Incorporating edge-level, meta-path, and graph structure information, it provides meaningful insights into factors influencing drug response, supporting clinicians and researchers in the development of targeted therapies. Empirical results validate the efficacy of ExplainMIX in prediction and interpretation tasks by constructing a quantitative evaluation ground truth. This approach aims to contribute to precision medicine research by addressing challenges in interpretable personalized drug response prediction within the landscape of cancer.