Related Topics
Articles published on multiple-views
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
4512 Search results
Sort by Recency
- Research Article
4
- 10.1016/j.neunet.2025.107981
- Jan 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Yiran Cai + 4 more
Tensorized anchor alignment for incomplete multi-view clustering.
- Research Article
- 10.1109/tmi.2026.3651389
- Jan 1, 2026
- IEEE transactions on medical imaging
- Han Wu + 5 more
Precise landmark annotation in cardiac ultrasound images is fundamental for quantitative cardiac health assessment. However, the time-intensive nature of manual annotation typically constrains clinicians to annotate only selected key frames, limiting comprehensive temporal analysis capabilities. While recent automated landmark detection methods have demonstrated success for key-frame analysis, they fail to effectively utilize the intrinsic temporal information across cardiac sequence. To bridge this gap, we present SemiEchoTracker, a novel semi-supervised framework that enables comprehensive landmark tracking throughout echocardiography sequences while requiring supervision only on key frames. Our framework introduces three key innovative strategies: 1) a co-training mechanism that enforces mutual consistency between spatial detection and temporal tracking, enabling accurate intermediate frame detection without additional annotations, 2) a guided DINOv2 pretraining strategy that is specially tailored for extracting fine-grained echocardiography-specific spatial features, and 3) a perception-aware spatial-temporal (PAST) attention module that efficiently captures inter- and intra-frame relationships in echocardiography videos. Extensive validation on three datasets across multiple cardiac views demonstrates that our method not only achieves state-of-the-art detection performance on the keyframes but also yields accurate frame-by-frame prediction, which is important for dynamic cardiac analysis in clinicians.
- Supplementary Content
- 10.1155/cric/6816373
- Jan 1, 2026
- Case Reports in Cardiology
- Ayman Helal + 3 more
Misplacement of pacemakers lead into the left ventricle (LV) is a rare but clinically important complication, often facilitated by unrecognized intracardiac shunts such as a patent foramen ovale (PFO). Early recognition is essential to avoid systemic embolization and ensure safe device function. We report a man in his 70s with a background of bioprosthetic aortic valve replacement, coronary bypass grafting, hypertension, chronic kidney disease, Parkinson′s disease, and prostate cancer, who underwent permanent pacemaker implantation for symptomatic sinus pauses. Follow‐up echocardiography 1 year later, performed as part of surveillance of his aortic valve prosthesis, unexpectedly revealed that the ventricular lead had crossed a PFO and was positioned in the LV via the mitral valve. His 12‐lead ECG demonstrated a right bundle branch block‐like paced morphology, raising suspicion of LV pacing. The patient remained asymptomatic with no evidence of systemic embolization. He was anticoagulated with apixaban and subsequently underwent successful lead extraction and repositioning into the right ventricle (RV). Correct RV placement was confirmed using multiple fluoroscopic views, particularly the left anterior oblique (LAO) projection and by postprocedure ECG, chest x‐ray, and echocardiogram. This case underlines the importance of careful assessment of paced ECG morphology, fluoroscopic views during implantation (especially LAO), and postimplant imaging to confirm lead location. Suspicion should be raised when an RBBB‐like QRS morphology is observed during RV pacing. Timely recognition and management with anticoagulation, followed by extraction and repositioning, can prevent potentially devastating complications. Operators should remain vigilant for inadvertent LV lead placement, particularly in patients with unrecognized PFO. Routine use of multiple fluoroscopic projections and correlation with ECG and echocardiography can aid early diagnosis and improve procedural safety.
- Research Article
- 10.1109/access.2026.3672128
- Jan 1, 2026
- IEEE Access
- Timothy Miskell + 3 more
Malicious software presents significant risks to computer systems, networks, and sensitive data, making malware detection a critical cybersecurity challenge. Labeling malware data not only requires expert knowledge but is also time-intensive. Given the constantly evolving threat landscape, an ever increasing amount of malware remains unlabeled, and those samples that are labeled may be incomplete or inconsistent. In this work, we introduce a self-supervised method using contrastive learning to perform static analysis and classification of malicious Portable Executable (PE) files, reducing the dependency on labeled data. We also develop data augmentation techniques that generate multiple augmented views and design PE-specific augmentation operators to be used during self-supervised learning such as shuffle, encryption, and compression based on the IMAGE_SECTION_HEADER. Our method is built upon raw PE byte sequences extracted from a large-scale publicly available dataset, SoRel-20M, which contains 20 million PE samples. Utilizing a two-stage framework consisting of a self-supervised contrastive learning pre-training phase followed by a supervised fine-tuning phase with limited amounts of labeled data, our model learns label-free invariant representations of the PE structure and as a result outperforms a traditional supervised Convolutional Neural Network (CNN), achieving a macro-averaged F1 score of 78.6% with only 10% of the labeled data. In addition, our method only requires the first 1 KB of header data, whereas the supervised baseline requires 1 MB of the underlying PE header.
- Research Article
1
- 10.1109/tip.2025.3648582
- Jan 1, 2026
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Chengji Wang + 4 more
One-shot Text-to-Image Person Re-Identification (One-shot TIReID) aims to construct a TIReID model using only a single labeled image-text pair per identity, along with a large pool of unlabeled person images. While supervised learning in text-to-image person re-identification has demonstrated high effectiveness, the requirement for extensive annotated data, both in terms of identities and corresponding textual descriptions, makes it impractical for large-scale camera networks. One-shot TIReID presents a promising approach to reduce the annotation burden. The primary challenge in one-shot TIReID lies in establishing consistent visual-textual correspondences across diverse viewing conditions, particularly in the absence of cross-view paired data. To address this challenge, we propose a novel progressive discrepancy learning framework, termed P-CLIP, which aims to establish a shared embedding space that is robust to view-specific biases. To achieve this goal, we dynamically construct multi-view image-text pairs based on a single labeled pair and simultaneously project the multi-view data into a unified embedding space. Specifically, we propose a Progressive Multi-View Generation method (MVG) to generate multiple noisy views from a single labeled instance for training. To mitigate cross-view ambiguities, we introduce a Cross-View Discrepancy Learning module (CDL) that leverages the discrepancies among different views to guide the learning of cross-view visual-textual correspondences. This approach effectively integrates multimodal error correction into the person re-identification domain. Furthermore, to enhance the effectiveness of visual-textual correspondence learning, we propose a Compact Cross-Modal Matching Loss (CCM), which suppresses unmatched pairs while emphasizing matched ones. Extensive experiments were conducted on three benchmark datasets, and the experimental results demonstrate the effectiveness of our proposed method. The data and codes are available at https://github.com/Itachjw/P-CLIP/tree/main.
- Research Article
1
- 10.1016/j.compmedimag.2025.102697
- Jan 1, 2026
- Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society
- Jaeyoung Huh + 3 more
Wholistic report generation for Breast ultrasound using LangChain.
- Research Article
- 10.1109/tpami.2026.3674997
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Zhengyu Liang + 7 more
Light field (LF) cameras capture the light rays of a 3D scene from multiple views simultaneously, and thus provide a more immersive experience of the real world as compared to traditional cameras. Although significant progress has been made in various LF image processing tasks, it remains challenging to effectively model the non-local spatial-angular correlations inherent in LF images, particularly when dealing with complex disparity variations. In this paper, we focus on orthogonal epipolar geometry of LF images and propose a generic Epipolar Transformer mechanism that incorporates geometrically meaningful correlations along the epipolar lines. Our Epipolar Transformer mechanism enjoys the following benefits: learning effective and diverse LF feature representations, delivering satisfactory results without redundant architectural designs, and enabling flexible extension to various LF-related tasks with simple adaptations. For LF spatial and angular super-resolution, our methods not only achieve state-of-the-art performance on benchmark datasets, but also demonstrate superior and robust performance on large disparity variations. For disparity estimation, we explore the use of geometry information encoded in our Epipolar Transformer to directly regress the disparity results, effectively avoiding the limitation of a fixed maximum disparity. The code and models are available at https://github.com/ZhengyuLeung/BasicLFSR-plus.
- Research Article
- 10.1002/mp.70261
- Jan 1, 2026
- Medical physics
- Xuxin Chen + 7 more
Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces big challenges and no such CAD schemes have been used in clinical practice. To overcome these challenges, we investigate a new approach based on the concept of contrastive language-image pre-training (CLIP), which has sparked interest across various medical imaging tasks. The aim is to solve the challenges in: (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources. We introduce a unique Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the craniocaudal (CC) and mediolateral oblique (MLO) views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP's image and text encoders for fine-tuning the model efficiently and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP. Mammo-CLIP outperforms the state-of-the-art (SOTA) cross-view transformer evaluated using areas under ROC curves (AUC=0.841±0.017vs. 0.817±0.012 and 0.837±0.034vs. 0.807±0.036) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3% in AUC. The proposed Mammo-CLIP demonstrates superior breast cancer diagnosis performance compared to SOTA methods. This study highlights the potential of applying the finetuned vision-language models for developing multi-view, image-text-based CAD schemes of breast cancer.
- Research Article
- 10.1109/tvcg.2026.3683358
- Jan 1, 2026
- IEEE transactions on visualization and computer graphics
- Erwan Leria + 3 more
Light field (LF) displays address the mismatch in focus cues present in traditional displays by triggering natural defocus blur and enabling motion parallax. They rely on geometrical optics, displaying rays from multiple angles of view. LF path tracing is computationally expensive for real-time applications, since it requires rendering multiple views. To reduce this computational complexity, spatially reprojecting pixels between views is commonly performed. Reusing pixels that are already rendered is cheaper than path tracing additional ones. However, when occluded areas are uncovered in some views, reprojection is not possible, creating holes in these views. Filling-in the holes requires extra path tracing computation. This paper investigates scalable hole-filling strategies for LF path tracing, using multiple GPUs to reach real-time performance. We propose an algorithm search optimization procedure to determine whether a specific assignment algorithm can be generalized across scenes, using the hole-filling time as a minimization function. In addition, we introduce DaSH (Discarded and Subsampled path traced Hole-filling), a novel method that reduces computation and divergence overhead in fixed-size hardware thread-blocks. Based on local pixel sparsity within pixel patches, DaSH adaptively subsamples and discards hole-filling rays. Our evaluation demonstrates that DaSH achieves significant performance gains while preserving the visual and structural quality of refocused light field images at the retina plane. The experiments demonstrate an average speedup factor of $1.8\times$ for DaSH, compared to prior work, in a multi-GPU rendering system.
- Research Article
- 10.1016/j.media.2025.103829
- Jan 1, 2026
- Medical image analysis
- Hong Hui Yeoh + 4 more
To facilitate early detection of breast cancer, there is a need to develop risk prediction schemes that can prescribe personalized screening mammography regimens for women. In this study, we propose a new deep learning architecture called TRINet that implements time-decay attention to focus on recent mammographic screenings, as current models do not account for the relevance of newer images. We integrate radiomic features with an Attention-based Multiple Instance Learning (AMIL) framework to weigh and combine multiple views for better risk estimation. In addition, we introduce a continual learning approach with a new label assignment strategy based on bilateral asymmetry to make the model more adaptable to asymmetrical cancer indicators. Finally, we add a time-embedded additive hazard layer to perform dynamic, multi-year risk forecasting based on individualized screening intervals. We used two public datasets, namely 8528 patients from the American EMBED dataset and 8723 patients from the Swedish CSAW dataset in our experiments. Evaluation results on the EMBED test set show that our approach performs comparably with state-of-the-art models, achieving AUC scores of 0.851, 0.811, 0.796, 0.793, and 0.789 across 1-, 2-, to 5-year intervals, respectively. Our results underscore the importance of integrating temporal attention, radiomic features, time embeddings, bilateral asymmetry, and continual learning strategies, providing a more adaptive and precise tool for breast cancer risk prediction.
- Research Article
2
- 10.1109/tpami.2026.3654665
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Fangjinhua Wang + 7 more
3D reconstruction aims to recover the dense 3D structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views of a scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environments. Due to its efficiency and effectiveness, MVS has become a pivotal method for image-based 3D reconstruction. Recently, with the success of deep learning, many learning-based MVS methods have been proposed, achieving impressive performance against traditional methods. We categorize these learning-based methods as: depth map-based, voxel-based, NeRF-based, 3D Gaussian Splatting-based, and large feed-forward methods. Among these, we focus significantly on depth map-based methods, which are the main family of MVS due to their conciseness, flexibility and scalability. In this survey, we provide a comprehensive review of the literature at the time of this writing. We investigate these learning-based methods, summarize their performances on popular benchmarks, and discuss promising future research directions in this area.
- Research Article
- 10.1109/tgrs.2026.3672192
- Jan 1, 2026
- IEEE Transactions on Geoscience and Remote Sensing
- Lin Qi + 5 more
Hyperspectral unmixing based on autoencoders is a crucial research task in remote sensing imagery. While existing deep hyperspectral unmixing networks primarily focus on spatial features, the inherently rich and continuous spectral bands in hyperspectral images harbor significant underutilized information. The intrinsic characteristics of these continuous bands can naturally enable more nuanced and effective modeling of mixed pixels. In this paper, we propose the multiview collaborative dual-branch network (MCDB-Net), designed to fully learn and exploit the complex spectral features in high-dimensional hyperspectral data, thereby enhancing the representation capabilities of these features. MCDB-Net constructs a novel multiview spectral block to strengthen the correlation between multiple views of pixel spectra. The full view spectral feature extraction module highlights important spectral features, while the local multiview spectral feature extraction module provides a detailed understanding of the interactions between multiview spectral information. The multiview abundance collaboration module collaboratively learns spectral feature information from different perspectives and dynamically adjusts the weights of abundance estimations, leading to better integration of abundance features across various views. Extensive experiments on different datasets demonstrate that MCDB-Net achieves higher continuity and robustness in unmixing results, showcasing its powerful capability in spectral feature extraction.
- Research Article
- 10.4000/15rd2
- Jan 1, 2026
- Oltreoceano
- Marzia Dati
In this paper I investigate how John Reed’s poetics and narrative of the city perfectly blend Jacob Riis and Alfred Stieglitz’s views of New York. Though they used two different kinds of artistic media, word and image, they captured the metropolis from “below” (the poverty in the slums), and from “above” (the wealth and the idea of progress expressed by the skyscrapers). New York had a tremendous impact on artists, writers and intellectuals. In How the Other Half Lives: Studies among the Tenements of New York, an interesting example of photo-text, Riis describes the awful conditions of the old and new immigrants settled in New York: his camera sneaks into the slums and the sweatshops of the Lower East Side revealing an infernal underworld made up of poverty and degradation; by contrast, Stieglitz points his camera to the heights of the skyscrapers which, like gothic cathedrals, tower to the sky as harbingers of a radiant future. This dichotomy is found in John Reed’s poems and in some relevant articles published in the radical newspapers of his time. Reed as a young poet and journalist arrived in New York in 1910 and was fascinated by the city: new spaces, and new perspectives opened before his eyes but also a world of suffering. Through the analysis of the poems “A Hymn to Manhattan,” “Proud New York,” “America, 1918,” and the articles “Immigrants,” and the “Tide flows East,”, and the comparison with the photography “The Bent” by Riis, “The Steerage” and “City of Ambition” by Stieglitz, I will try to highlight how the two photographers’ visions of New York and John Reed’s writings can be brought into a mutual dialogue that sheds light on the tight relation between literature and photography.
- Research Article
- 10.47772/ijriss.2026.100300167
- Jan 1, 2026
- International Journal of Research and Innovation in Social Science
- Novikova V P + 1 more
This paper examines how modernist narrative techniques of stream of consciousness and fragmentation are remediated in contemporary virtual reality (VR) storytelling. Drawing on discourse-narratological frameworks, it analyzes how these literary strategies transform from linguistic expression into spatial, sensory, and ergodic discourse structures. The aim is to demonstrate that VR does not constitute a narrative rupture but continues modernist principles through new semiotic channels, positioning users as active interpreters rather than passive recipients. The empirical material consists of three VR films—Dear Angelica (2017), Notes on Blindness (2016), and Spheres (2018)—selected for their emphasis on subjective perception, memory, and non-linearity. These works are compared with modernist novels including Virginia Woolf's Mrs Dalloway and To the Lighthouse, and James Joyce's Ulysses. The methodology employs qualitative discourse analysis combined with Transmedial narratology, focusing on spatial transitions, sensory cues, and user navigation patterns across multiple viewings of each VR piece. Key findings reveal that stream of consciousness manifests as spatial flows driven by sensorial montage rather than syntactic disruption, while fragmentation becomes ergodic reconstruction requiring embodied navigation. In Dear Angelica, memory fragments orbit the user spatially; Notes on Blindness organizes consciousness through auditory episodes; and Spheres externalizes cosmic reflection without temporal markers. These techniques decentralize narrative authority and compels users to assemble coherence through movement and perception, echoing modernist interpretive demands but physicalizing interpretive labor. The study contributes to digital humanities by bridging literary modernism with VR discourse, challenging immersion-centric VR research. It extends Transmedial narratology to account for embodiment and spatiality, offering new analytical tools for immersive media. Findings suggest VR creators draw implicitly from modernist strategies with implications for narrative design that prioritize ambiguity and reconstruction over linear exposition.
- Research Article
1
- 10.1186/s13073-025-01593-8
- Dec 31, 2025
- Genome Medicine
- Haowei Du + 18 more
BackgroundCopy number variation (CNV) is a class of genomic structural variation (SV) that contributes to genomic disorders and can significantly impact health. Short-read genome sequencing (sr-GS) enables genome-wide SV calling which has been shown to increase diagnosis in unsolved rare disease families. The growing number of large sequencing cohort projects with sr-GS data available requires open free analytical tools that provide visualization of CNV and SV integrated calls associated with gene annotation, proband-parent trio analysis to enable prioritization of de novo variants, B-allele frequency (BAF) plots to support CNV calls, parent of origin assessment and mosaicism detection.MethodsTo support those needs, we developed VizCNV, an open-source platform that incorporates read depth and BAF to enable haplotype-aware CNV analysis. The tool incorporates multiple interactive view modes for SV concurrent calls and annotation tracks for analyzing chromosomal abnormalities [e.g., aneuploidy, segmental aneusomy, and chromosome translocations], gene exonic rearrangements and non-coding gene regulatory regions. In addition, VizCNV includes a built-in filter schema for trio genomes, prioritizing the detection of de novo CNVs. We optimized VizCNV using 1000 Genomes Project data and benchmarked its performance against a cohort containing CNVs validated by multiple technologies. Finally, we applied VizCNV to a molecularly unsolved primary immunodeficiency disease cohort (PIDD, n = 39) previously analyzed by exome sequencing.ResultsUpon computational optimization, VizCNV achieved approximately 82.3% recall and 76.3% precision for deletions > 10 kb. VizCNV accurately detected all 71 validated copy number gains and correctly indicated potential underlying genomic complexities. Haplotype-aware CNV analysis identified a meiosis I non-disjunction event (trisomy 21), three de novo CNVs at two unique loci and 48 inherited candidate CNVs in the PIDD cohort of which 42% (20/48) were validated by integrated CNV/BAF analysis. Moreover, genotype–phenotype analyses revealed that a compound heterozygous combination of a paternal 12.8 kb deletion of exon 5 and a maternal missense variant allele of DOCK8 are the molecular cause of one proband diagnosed with Hyper-IgE syndrome.ConclusionsVizCNV provides a robust and flexible platform for identification of aneuploidies, CNV, SV discovery and visualization of CNV and BAF data. It is also a useful tool to investigate features of genomic rearrangements such as parental origin which has implications for genetic counseling and mechanistic studies. The tool is freely available through https://doi.org/10.6084/m9.figshare.25869523.Supplementary InformationThe online version contains supplementary material available at 10.1186/s13073-025-01593-8.
- Research Article
- 10.20961/jbssa.v31i2.111718
- Dec 29, 2025
- Jurnal Bahasa, Sastra, dan Studi Amerika
- David Andriano K Manurung + 2 more
<span>Politeness is not merely etiquette; it shapes how language conveys identity, emotion, and social connection. This study investigates politeness strategies in Soul (2020), a Pixar film that blends lighthearted storytelling with philosophical themes. Using Brown and Levinson’s (1987) framework comprising Bald on Record, Positive Politeness, Negative Politeness, and Off-Record strategies the study adopts a qualitative descriptive method. Data were collected from the film’s official script and cross-checked through multiple viewings for contextual accuracy. A total of 65 utterances were identified: 29 instances of positive politeness, 21 bald on record, 8 negative politeness, and 7 off-record strategies. These findings suggest the film prioritizes affirming, empathetic communication to navigate relationships and emotional transformation. The analysis demonstrates how politeness is used not only to manage social harmony, but also to express vulnerability and personal growth. Ultimately, the study highlights how animated dialogue can serve as a rich site for examining the interplay between language, meaning, and identity in animated storytelling.</span>
- Research Article
- 10.3390/diagnostics16010066
- Dec 24, 2025
- Diagnostics
- Latika Giri + 9 more
Background: Chest radiography is the most widely used diagnostic imaging modality globally, yet its interpretation is hindered by a critical shortage of radiologists, especially in low- and middle-income countries (LMICs). The interpretation is both time-consuming and error-prone in high-volume settings. Artificial Intelligence (AI) systems trained on public data may lack generalizability to multi-view, real-world, local images. Deep learning tools have the potential to augment radiologists by providing real-time decision support by overcoming these. Objective: We evaluated the diagnostic accuracy of a deep learning-based convolutional neural network (CNN) trained on multi-view, hybrid (public and local datasets) for detecting thoracic abnormalities in chest radiographs of adults presenting to a tertiary hospital, operating in offline mode. Methodology: A CNN was pretrained on public datasets (Vin Big, NIH) and fine-tuned on a local dataset from a Nepalese tertiary hospital, comprising frontal (PA/AP) and lateral views from emergency, ICU, and outpatient settings. The dataset was annotated by three radiologists for 14 pathologies. Data augmentation simulated poor-quality images and artifacts. Performance was evaluated on a held-out test set (N = 522) against radiologists’ consensus, measuring AUC, sensitivity, specificity, mean average precision (mAP), and reporting time. Deployment feasibility was tested via PACS integration and standalone offline mode. Results: The CNN achieved an overall AUC of 0.86 across 14 abnormalities, with 68% sensitivity, 99% specificity, and 0.93 mAP. Colored bounding boxes improved clarity when multiple pathologies co-occurred (e.g., cardiomegaly with effusion). The system performed effectively on PA, AP, and lateral views, including poor-quality ER/ICU images. Deployment testing confirmed seamless PACS integration and offline functionality. Conclusions: The CNN trained on adult CXRs performed reliably in detecting key thoracic findings across varied clinical settings. Its robustness to image quality, integration of multiple views and visualization capabilities suggest it could serve as a useful aid for triage and diagnosis.
- Research Article
- 10.3390/e28010026
- Dec 24, 2025
- Entropy
- Yue Wang + 4 more
Multi-view learning has recently gained considerable attention in graph representation learning as it enables the fusion of complementary information from multiple views to enhance representation quality. However, most existing studies neglect that irrelevant views may introduce noise and negatively affect representation quality. To address the issue, we propose a novel multi-view representation learning framework called a View Filter-driven graph representation fusion network, named ViFi. Following the “less for better” principle, the framework focuses on filtering informative views while discarding irrelevant ones. Specifically, an entropy-based adaptive view filter was designed to dynamically filter the most informative views by evaluating their feature–topology entropy characteristics, aiming to not only reduce irrelevance among views but also enhance their complementarity. In addition, to promote more effective fusion of informative views, we propose an optimized fusion mechanism that leverages the filtered views to identify the optimal integration strategy using a novel information gain function. Through extensive experiments on classification and clustering tasks, ViFi demonstrates clear performance advantages over existing state-of-the-art approaches.
- Research Article
- 10.1021/acs.jcim.5c02342
- Dec 23, 2025
- Journal of chemical information and modeling
- Vadim Korolev + 2 more
Data-driven approaches are essential for relating properties to the chemical structure. Atom-focused views of individual compounds are common in molecular representation learning: graph neural networks and chemical language models, the two main algorithm classes, take atomic-level graphs and atom-wise token sequences as input, respectively. However, directly integrating information about functional groups into advanced architectures remains nearly unexplored. To fill this gap, we introduce gSelformer-MV, a transformer that operates on multiple views of Group SELFIES (a SELFIES variant augmented with tokens for functional groups) that enables representation at both the atomic and substructure levels. Unlike prior Group SELFIES approaches that produce a single string per molecule, gSelformer-MV constructs multiple subgraph-partitioned Group SELFIES views and uses them jointly during training and inference. We show that gSelformer-MV is superior in terms of accuracy and explainability to the models trained exclusively on SELFIES strings. Moreover, gSelformer-MV achieves state-of-the-art performance on several regression benchmarks; further gains are obtained when restricting to high-confidence predictions. These results indicate that subgraph augmentation is a simple and effective route for advancing string-based molecular property prediction.
- Research Article
- 10.4081/rp.2025.1111
- Dec 23, 2025
- Ricerca Psicoanalitica
- Pietro Roberto Goisis
I still remember the emotion I felt in 1974 when I saw Scenes from a Marriage by Ingmar Bergman. Since then, I have been convinced that Scandinavian directors possess a unique sensitivity in portraying emotional relationships. This impression resurfaced while watching the film by the Norwegian director Lilja Ingolfsdottir. La solitudine dei non amati (the original title, Lovable, might have been more appropriate, and certainly preferable to the first Italian version, La teoria dell’attaccamento…) is a work that lends itself to many interpretations and, like all films rich in dialogue, deserves multiple viewings. I will try to share the thoughts it prompted in me. [...]