Articles published on Temporal consistency
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1846 Search results
Sort by Recency
- New
- Research Article
- 10.1002/jsid.2111
- Dec 3, 2025
- Journal of the Society for Information Display
- Charles Ding + 1 more
ABSTRACT Recent breakthroughs in generative AI have markedly elevated the realism and controllability of synthetic media. In the visual modality, long‐context attention mechanisms and diffusion‐style refinements now deliver videos with superior temporal consistency, spatial coherence, and high‐resolution detail. These techniques underpin an expanding set of applications ranging from text‐guided storyboarding and animation to engineering visualization and virtual prototyping. In the audio modality, token‐based representations combined with hierarchical decoding enable the direct production of faithful speech, music, and ambient sound from textual prompts, powering rapid voice‐over creation, personalized music, and immersive soundscapes. The frontier is shifting toward unified audio–visual pipelines that synchronize imagery with dialog, sound effects, and ambience, promising end‐to‐end tooling for a wide variety of applications such as education, simulation, entertainment, and accessible content production. This review surveys these advances across modalities and outlines future research directions focused on improving generation efficiency, coherence, and controllability across modalities.
- New
- Research Article
- 10.1016/j.ecoinf.2025.103216
- Dec 1, 2025
- Ecological Informatics
- Qi Shao + 8 more
Selecting of global phenological field observations for validating coarse AVHRR-derived forest phenology products based on spatial heterogeneity and temporal consistency
- New
- Research Article
- 10.1016/j.neunet.2025.107976
- Dec 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Fan Zhang + 2 more
Adaptively trigger memory network with temporal consistency for semi-supervised long video object segmentation.
- New
- Research Article
- 10.1109/tpami.2025.3596700
- Dec 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
- Xuan Dong + 5 more
Moire patterns, unwanted color artifacts in images and videos, arise from the interference between spatially high-frequency scene contents and the spatial discrete sampling of digital cameras. Existing demoireing methods primarily rely on single-camera image/video processing, which faces two critical challenges: 1) distinguishing moire patterns from visually similar real textures, and 2) preserving tonal consistency and temporal coherence while removing moire artifacts. To address these issues, we propose a dual-camera framework that captures synchronized videos of the same scene: one in focus (retaining high-quality textures but may exhibit moire patterns) and one defocused (with significantly reduced moire patterns but blurred textures). We use the defocused video to help distinguish moire patterns from real texture, so as to guide the demoireing of the focused video. We propose a frame-wise demoireing pipeline, which begins with an optical flow based alignment step to address any discrepancies in displacement and occlusion between the focused and defocused frames. Then, we leverage the aligned defocused frame to guide the demoireing of the focused frame using a multi-scale CNN and a multi-dimensional training loss. To maintain tonal and temporal consistency, our final step involves a joint bilateral filter to leverage the demoireing result from the CNN as the guide to filter the input focused frame to obtain the final output. Experimental results demonstrate that our proposed framework largely outperforms state-of-the-art image and video demoireing methods.
- New
- Research Article
- 10.1145/3763348
- Dec 1, 2025
- ACM Transactions on Graphics
- Sipeng Yang + 9 more
Supersampling has proven highly effective in enhancing visual fidelity by reducing aliasing, increasing resolution, and generating interpolated frames. It has become a standard component of modern real-time rendering pipelines. However, on mobile platforms, deep learning-based supersampling methods remain impractical due to stringent hardware constraints, while non-neural supersampling techniques often fall short in delivering perceptually high-quality results. In particular, producing visually pleasing reconstructions and temporally coherent interpolations is still a significant challenge in mobile settings. In this work, we present a novel, lightweight supersampling framework tailored for mobile devices. Our approach substantially improves both image reconstruction quality and temporal consistency while maintaining real-time performance. For super-resolution, we propose an intra-pixel object coverage estimation method for reconstructing high-quality anti-aliased pixels in edge regions, a gradient-guided strategy for non-edge areas, and a temporal sample accumulation approach to improve overall image quality. For frame interpolation, we develop an efficient motion estimation module coupled with a lightweight fusion scheme that integrates both estimated optical flow and rendered motion vectors, enabling temporally coherent interpolation of object dynamics and lighting variations. Extensive experiments demonstrate that our method consistently outperforms existing baselines in both perceptual image quality and temporal smoothness, while maintaining real-time performance on mobile GPUs. A demo application and supplementary materials are available on the project page.
- New
- Research Article
- 10.54097/xgwf3b06
- Nov 28, 2025
- Journal of Computing and Electronic Information Management
- Guimei Yin + 10 more
Developmental dyslexia is a common neurodevelopmental learning disorder that severely impacts children's reading abilities and social adaptation. In recent years, brain network analysis based on functional magnetic resonance imaging has provided new insights into its neural mechanisms, yet it struggles to capture the temporal characteristics of dynamic brain interactions. To address this, this paper proposes a GAT-LSTM framework for high-precision classification of DD. This method first constructs a dynamic functional connectivity network based on the AAL90 brain atlas. It then employs GAT to adaptively learn spatial dependencies between brain regions within each time window, followed by LSTM to model the temporal evolution patterns of node embedding sequences. To further enhance the model's temporal consistency and discriminative power, dynamic graph stability constraints are introduced during training. Experimental results demonstrate that the proposed method achieves an 85.36% classification accuracy, significantly outperforming baseline models. This study not only provides a novel computational paradigm for the objective diagnosis of DD but also offers robust support for the application of brain network modeling in neurodevelopmental disorder research.
- New
- Research Article
- 10.3390/electronics14234663
- Nov 27, 2025
- Electronics
- Zhengyi Lu + 2 more
Intelligent Transportation Systems (ITSs), particularly autonomous driving, face critical challenges when sensor modalities fail due to adverse conditions or hardware malfunctions, causing severe perception degradation that threatens system-wide reliability. We present a unified geometry-aware cross-modal translation framework that synthesizes missing sensor data while maintaining temporal consistency and quantifying uncertainty. Our pipeline enforces 92.7% frame-to-frame stability via an optical-flow-guided spatio-temporal module with smoothness regularization, preserves fine-grained 3D geometry through pyramid-level multi-scale alignment constrained by the Chamfer distance, surface normals, and edge consistency, and ultimately delivers dropout-tolerant perception by adaptively fusing multi-modal cues according to pixel-wise uncertainty estimates. Extensive evaluation on KITTI-360, nuScenes, and a newly collected Real-World Sensor Failure dataset demonstrates state-of-the-art performance: 35% reduction in Chamfer distance, 5% improvement in BEV (bird’s eye view) segmentation mIoU (mean Intersection over Union) (79.3%), and robust operation maintaining mIoU under complete sensor loss for 45+ s. The framework achieves real-time performance at 17 fps with 57% fewer parameters than competing methods, enabling deployment-ready sensor-agnostic perception for safety-critical autonomous driving applications.
- New
- Research Article
- 10.3390/app152312525
- Nov 26, 2025
- Applied Sciences
- Julia De Enciso García + 7 more
Surgical Phase Recognition (SPR) enables real-time, context-aware assistance during surgery, but its use remains limited by the cost and effort of dense video annotation. This study presents a Semi-Supervised Deep Learning framework for SPR in endoscopic pituitary surgery, aiming to reduce annotation requirements while maintaining performance. A Timestamp Supervision strategy is employed, where only one or two representative frames per phase are labeled. These labels are then propagated, creating pseudo-labels for unlabeled frames using an Uncertainty-Aware Temporal Diffusion (UATD) approach, based on confidence and temporal consistency. Multiple spatial and temporal architectures are evaluated on the PituPhase–SurgeryAI dataset, the largest publicly available collection of endoscopic pituitary surgeries to date, which includes an outside-the-body phase. Despite using less than 3% of the annotated data, the proposed method achieves an F1-score of 0.60 [0.55–0.65], demonstrating competitive performance against previous Supervised approaches in the same context. Removing the recurrent outside-the-body phase reduces misclassification and improves temporal consistency. These results demonstrate that uncertainty-guided Semi-Supervision is a scalable and clinically viable alternative to fully Supervised Learning for surgical workflow analysis.
- New
- Research Article
- 10.1631/fitee.2500412
- Nov 26, 2025
- Frontiers of Information Technology & Electronic Engineering
- Yangliu Hu + 4 more
TimeJudge: empowering video-LLMs as zero-shot judges for temporal consistency in video captions
- New
- Research Article
- 10.51244/ijrsi.2025.1210000357
- Nov 24, 2025
- International Journal of Research and Scientific Innovation
- Sakshi Bhandari + 3 more
Deepfake technology, driven by advancements in deep learning and generative models, enables highly realistic manipulation of facial appearances in videos, often through face-swapping techniques. While such methods have potential in entertainment and creative applications, they also pose serious threats to privacy, trust, and information integrity. This paper presents the development of a machine learning (ML)-based system for detecting face-swap deepfake videos. The proposed approach employs video preprocessing, frame extraction, and facial region isolation, followed by feature extraction using a deep convolutional neural network (ResNeXt). Temporal consistency is analyzed with a Long Short-Term Memory (LSTM) network to capture sequential artifacts. Experimental results demonstrate the system’s ability to distinguish real and fake videos with high accuracy, contributing to digital forensics and misinformation mitigation efforts.[1]
- New
- Research Article
- 10.3390/app152312423
- Nov 23, 2025
- Applied Sciences
- Ayşe Tuğba Yapıcı + 1 more
This study presents models for estimating the charging time and travel time in autonomous electric taxi systems, based on Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) deep learning methods. In addition to these models, two classical time-series forecasting techniques ARIMA and Prophet were also applied to provide a broader comparative baseline. Unlike traditional time-series prediction methods, the proposed system combines artificial intelligence with Internet of Things (IoT) technologies to perform secure charging operations based on multi-layer cybersecurity mechanisms, including IP authentication, encrypted communication, and charger server validation steps. The models were trained and validated using a comprehensive dataset obtained from 100 electric vehicles with different battery capacities at 50 charging stations located in Kocaeli Province. In the predictions considering parameters such as the vehicle type, battery capacity, and charge level, both models showed high accuracy rates, with the GRU model performing better than the LSTM model in terms of the error rate and temporal consistency. ARIMA and Prophet, on the other hand, produced significantly lower performance compared to deep learning models, confirming that GRU is the most suitable approach for real-time duration estimation. Customers can obtain the estimated time, cost, and charging requirements before their trip, and continuous multi-stage IP-based security controls are performed throughout the charging process as part of the cybersecurity framework. If a foreign or unauthorized connection is detected, the charging operation is automatically stopped. The proposed approach not only increases the efficiency in electric vehicle energy management but also presents an innovative framework that contributes to sustainable and smart transportation. By combining deep learning models, classical statistical forecasting methods, IoT integration, and enhanced cybersecurity controls, this work represents a pioneering step toward autonomous, secure, and eco-friendly urban transportation systems.
- New
- Research Article
- 10.1108/tg-06-2025-0155
- Nov 18, 2025
- Transforming Government: People, Process and Policy
- Olha Maletova + 3 more
Purpose This study aims to examine the potential of artificial intelligence (AI) systems to generate innovative anti-corruption measures for martial law governance and post-conflict reconstruction, addressing gaps in traditional policy frameworks designed for peacetime implementation. Design/methodology/approach A mixed-methods design used six top AI systems – ChatGPT-4, ChatGPT-4o, ChatGPT-3.5, GitHub Copilot, Google GEMINI and Anthropic Claude – with bilingual English and Ukrainian queries. This study used systematic cross-validation procedures, temporal consistency verification and comprehensive content analysis over a 50-day research period from April to July 2024. A total of 216 responses were evaluated with standardized scoring matrices that assessed coherence, relevance and feasibility using five-point scales. The methodology integrated quantitative analysis of AI-generated responses with qualitative assessment of contextual appropriateness, ensuring robust evaluation of AI capabilities in crisis governance contexts. Findings AI systems demonstrated significant capability in identifying strategic gaps and proposing adaptive frameworks absent from Ukraine’s Anti-Corruption Strategy for 2021–2025. Notable variations emerged across linguistic contexts, with English-language responses showing greater analytical depth. Claude and ChatGPT-4 exhibited superior contextual understanding, while all systems identified five common anti-corruption measures: transparency initiatives, judicial reform, institutional strengthening, public engagement and education programs. However, critical limitations included contextual disconnection from existing Ukrainian institutions and reliance on pre-war training data. Originality/value This study introduces a pioneering bilingual methodology, evaluating AI-generated anti-corruption policies in both English and Ukrainian, addressing the unique challenges of martial law governance. It provides the first systematic evaluation of AI-generated policies in conflict contexts, offering practical frameworks for integrating AI with crisis governance.
- Research Article
- 10.1037/mac0000262
- Nov 6, 2025
- Journal of Applied Research in Memory and Cognition
- Emma R Page + 1 more
Temporal consistency of collective future thinking.
- Research Article
- 10.54844/ep.2025.1068
- Nov 5, 2025
- Editing Practice
- Fuxiang Liu + 2 more
Background: The reporting quality of participant eligibility criteria in retrospective studies significantly affects research reproducibility and result interpretation. However, standardized guidelines for writing eligibility criteria in retrospective studies are lacking. We aim to systematically evaluate the quality of eligibility criteria reporting in retrospective studies published in high-impact factor medical journals, develop evidence-based recommendations for standardization, and provide supplementary guidance for relevant reporting guidelines. Methods: We conducted a cross-sectional analysis of retrospective studies published in the top 40 nonreview medical journals listed in the Journal Citation Reports (JCR) from January 2023 to September 2024. We extracted article characteristics (journal, author, objective, and study type) and eligibility criteria components. Two independent reviewers did the quality assessment of literature, which focused on clarity of retrospective nature (temporal framework), purposefulness (alignment with research objectives), and logical consistency between inclusion and exclusion criteria. Results: Among the top 40 nonreview medical journals in the 2023 JCR rankings, 11 journals contained 78 retrospective studies that were analyzed, of which 2.6% (2/78) demonstrated unclear retrospectivity and purposefulness in eligibility criteria. Logical contradiction between exclusion and inclusion criteria was found in 11.5% (9/78) of articles. Inter-rater reliability for quality assessment was substantial (κ = 0.857). Conclusion: The reporting quality of participant eligibility criteria in retrospective studies published in high-impact factor medical journals was flawed. On the basis of our systematic evaluation, we propose a structured framework for formulating eligibility criteria that emphasizes temporal precision, diagnostic clarity, and logical consistency between inclusion and exclusion criteria to supplement existing research reporting guidelines.
- Research Article
- 10.3390/jemr18060063
- Nov 4, 2025
- Journal of Eye Movement Research
- Qi Zhu + 3 more
Current research on multimodal AR-HUD navigation systems primarily focuses on the presentation forms of auditory and visual information, yet the effects of synchrony between auditory and visual prompts as well as prompt timing on driving behavior and attention mechanisms remain insufficiently explored. This study employed a 2 (prompt mode: synchronous vs. asynchronous) × 3 (prompt timing: −2000 m, −1000 m, −500 m) within-subject experimental design to assess the impact of multimodal prompt synchrony and prompt distance on drivers’ reaction time, sustained attention, and eye movement behaviors, including average fixation duration and fixation count. Behavioral data demonstrated that both prompt mode and prompt timing significantly influenced drivers’ response performance (indexed by reaction time) and attention stability, with synchronous prompts at −1000 m yielding optimal performance. Eye-tracking results further revealed that synchronous prompts significantly enhanced fixation stability and reduced visual load, indicating more efficient information integration. Therefore, prompt mode and prompt timing significantly affect drivers’ perceptual processing and operational performance. Delivering synchronous auditory and visual prompts at −1000 m achieves an optimal balance between information timeliness and multimodal integration. This study recommends the following: (1) maintaining temporal consistency in multimodal prompts to facilitate perceptual integration and (2) controlling prompt distance within an intermediate range (−1000 m) to optimize the perception–action window, thereby improving the safety and efficiency of AR-HUD navigation systems.
- Research Article
- 10.1002/cav.70085
- Nov 1, 2025
- Computer Animation and Virtual Worlds
- Lalit Kumar + 1 more
ABSTRACT The proposed MotionBlend GAN model marks a significant step forward in video synthesis by blending the motion from a source video with the appearance of a target person's image. As training progresses, the model improves video creation by enhancing the smoothness and natural flow of motion, resulting in more coherent and lifelike videos. Using advanced techniques like MoBConv blocks of EfficientNet‐B7, OpenPose for precise pose detection, ResNet blocks for feature integration, and a 3D CNN discriminator, the model produces high‐quality videos that maintain both spatial and temporal consistency. After 200 epochs, the model achieved an adversarial loss of 0.2265, with metrics like PSNR at 20.246, SSIM at 0.867, and LPIPS at 0.178. The high PSNR and SSIM values, along with the low LPIPS, show that the generated frames are well aligned and preserve important details. These results highlight the model's strong performance over time, consistently generating visually convincing videos of human activities using a reference image and source video. The model effectively transfers motion from video to image, creating realistic videos of human activity in comparison to existing models.
- Research Article
- 10.1016/j.media.2025.103860
- Nov 1, 2025
- Medical image analysis
- Weiran Xia + 9 more
Triplet longitudinal masked autoencoder for predicting individualized functional connectome development during infancy.
- Research Article
- 10.1016/j.neuroimage.2025.121521
- Nov 1, 2025
- NeuroImage
- Duho Sihn + 1 more
Brain-wide patterns of oscillatory amplitudes represent naturalistic behavior.
- Research Article
- 10.1080/19392699.2025.2581178
- Nov 1, 2025
- International Journal of Coal Preparation and Utilization
- Lanhao Wang + 4 more
ABSTRACT Accurate real-time monitoring of the ash content in flotation clean coal is pivotal for intelligent optimization and closed-loop control of the flotation process, directly affecting product quality and the economic performance of coal preparation plants. To address the limitations of traditional approaches–namely response lag, insufficient accuracy, and inefficient fusion of multisource information–this study proposes an intelligent online sensing method based on multisource data fusion, with the prediction pipeline decoupled into three stages: alignment – representation – prediction. First, a multiscale, differentiable dynamic time-warping (MSSoftDTW) scheme is employed to precisely align asynchronous multisource time-series data, thereby enhancing cross-modal temporal consistency. Second, an interpretable Constructive algorithm with response-weight mechanism (ICA-RW) is introduced to enable feature learning and structural adaptation, suppressing redundancy and collinearity while improving feature robustness. Third, an ensemble regression model that combines a relevance vector machine with adaptive boosting (RVM-Adaboost) is developed to better accommodate nonlinear relationships and drifts in operating conditions. By fusing X-ray fluorescence (XRF) spectra, key process variables, and features extracted from tailings images, the method achieves high-accuracy, real-time prediction of clean-coal ash content. Validation on industrial-site data demonstrates significant gains in both accuracy and stability over conventional regression baselines, meeting the real-time requirements of online monitoring and control and providing deployable support for flotation process optimization and intelligent upgrading.
- Research Article
- 10.1088/2631-8695/ae15d3
- Oct 30, 2025
- Engineering Research Express
- Shihao Gu + 5 more
Abstract In dynamic environments, moving objects introduce unstable features that significantly degrade the accuracy of simultaneous localization and mapping (SLAM) systems. To address this issue, we propose Neural-KF, a robust visual SLAM framework that integrates three key modules: (1) a modified SuperPoint network with multi-level feature fusion for reliable static keypoint extraction, (2) a YOLOv8-based dynamic object detector, and (3) a Kalman-consistent state estimation mechanism that predicts object motion trajectories to enhance temporal consistency. By associating predicted and detected bounding boxes via the Hungarian algorithm, Neural-KF achieves accurate suppression of dynamic points while preserving sufficient static features for pose estimation. Experimental evaluations on public datasets, including KITTI and EuRoC, demonstrate that Neural-KF improves absolute trajectory error by up to 28% compared to VINS-Fusion and achieves competitive accuracy against advanced dynamic SLAM systems such as DynaSLAM. Furthermore, the system maintains real-time performance (>30 FPS) with a balanced trade-off between accuracy and computational cost. These results highlight the effectiveness of Neural-KF in achieving robust and efficient visual odometry under challenging dynamic conditions.