Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • New
  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1683786
Hybrid deep learning models for fake news detection: case study on Arabic and English languages
  • Jan 6, 2026
  • Frontiers in Big Data
  • Baqer M Merzah + 2 more

Introduction Fake news has become a significant threat to public discourse due to the swift spread of online content and the difficulty of detecting and distinguishing it from real news. This challenge is further amplified by society's increasing dependence on online social networks. Many researchers have developed machine learning and deep learning models to combat the spread of misinformation and identify fake news. However, the studies focused on a single language, and the performance analysis achieved a low accuracy, especially for Arabic, which faces challenges due to resource constraints and linguistic intricacies. Methods This paper introduces an effective deep-learning technique for fake news detection (FND) in Arabic and English. The proposed model integrates a multi-channel Convolutional Neural Network (CNN) and dual Bidirectional Long Short-Term Memory (BiLSTM), parallelly capturing semantic and local textual features embedded by a pre-trained FastText model. Subsequently, a global max-pooling layer was added to reduce dimensionality and extract salient features from the sequential output. Finally, the model classifies news as fake or real. Moreover, the model is trained and evaluated on three benchmark datasets, AFND and ANS, Arabic datasets, and WELFake, an English dataset. Results Experimental results highlight the model's effectiveness and performance superiority over state-of-the-art (SOTA) approaches, with (94.43 ± 0.19) %, (71.63 ± 1.45) %, and (98.85 ± 0.03) %, accuracy on AFND, ANS and WELFake, respectively. Discussion This work provides a robust approach to combating misinformation, offering practical applications in enhancing the reliability of information on social networks.

  • New
  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1745751
Time series forecasting for bug resolution using machine learning and deep learning models
  • Dec 19, 2025
  • Frontiers in Big Data
  • Lerina Aversano + 3 more

Predicting bug fix times is a key objective for improving software maintenance and supporting planning in open source projects. In this study, we evaluate the effectiveness of different time series forecasting models applied to real-world data from multiple repositories, comparing local (one model per project) and global (a single model trained across multiple projects) approaches. We considered classical models (Naive, Linear Regression, Random Forest) and neural networks (MLP, LSTM, GRU), with global extensions including Random Forest and LSTM with project embeddings. The results highlight that, at the local level, Random Forest achieves lower errors and better classification metrics than deep learning models in several cases. However, global models show greater robustness and generalizability: in particular, the global Random Forest significantly reduces the mean error and maintains high performance in terms of accuracy and F1 score, while the global LSTM captures temporal dependencies and provides additional insights into cross-project dynamics. The explainable AI techniques adopted (permutation importance, saliency maps, and embedding analysis) allow us to interpret the main drivers of forecasts, confirming the role of process variables and temporal characteristics. Overall, the study demonstrates that an integrated approach, combining classical models and deep learning in a global perspective, offers more reliable and interpretable forecasts to support software maintenance.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1710462
Inferring causal interplay between air pollution and meteorology
  • Dec 17, 2025
  • Frontiers in Big Data
  • Yves Philippe Rybarczyk + 3 more

IntroductionThis study investigates the bidirectional causal interplay between PM2.5 and relative humidity (RH) in Quito, Ecuador. Focusing on a high-altitude city with complex terrain, the objective is to understand pollution-climate feedbacks over a two-decade span.MethodsThe study employs Convergent Cross Mapping (CCM), a nonlinear empirical dynamic modeling approach. Hourly data were analyzed across four districts in Quito across two distinct time periods: 2004–2005 versus 2022–2024. Robustness of causality was confirmed using surrogate testing techniques.ResultsThe analysis reveals statistically significant, nonlinear, and time-variant couplings. While RH influenced PM2.5 in the early 2000s, the relationship inverted, with PM2.5 increasingly driving RH by the early 2020s. Partial-derivative analyses indicate shifting interaction signs and strengths. Notably, pollution was found to increasingly suppress RH, particularly in northern districts.DiscussionThe observed suppression of RH by pollution is consistent with urban heat island amplification and radiative effects. These findings underscore the necessity of nonlinear causality frameworks for understanding environmental feedbacks in complex terrains. The study highlights the need for integrated air quality and climate strategies. Future research should expand variables and monitoring sites to further generalize these findings.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1720525
Detecting anti-forensic deepfakes with identity-aware multi-branch networks
  • Dec 10, 2025
  • Frontiers in Big Data
  • Mingyu Zhu + 1 more

Deepfake detection systems have achieved impressive accuracy on conventional forged images; however, they remain vulnerable to anti-forensic or adversarial samples deliberately crafted to evade detection. Such samples introduce imperceptible perturbations that conceal forgery artifacts, causing traditional binary classifiers—trained solely on real and forged data—to misclassify them as authentic. In this paper, we address this challenge by proposing a multi-channel feature extraction framework combined with a three-class classification strategy. Specifically, one channel focuses on extracting identity-preserving facial representations to capture inconsistencies in personal identity traits, while additional channels extract complementary spatial and frequency domain features to detect subtle forgery traces. These multi-channel features are fused and fed into a three-class detector capable of distinguishing real, forged, and anti-forensic samples. Experimental results on datasets incorporating adversarial deepfakes demonstrate that our method substantially improves robustness against anti-forensic attacks while maintaining high accuracy on conventional deepfake detection tasks.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1718366
Unequal access in a digital age: women's digital exclusion and socioeconomic inequalities in Vietnam
  • Dec 4, 2025
  • Frontiers in Big Data
  • Chi Thi Lan Pham + 3 more

IntroductionAccess to information and communication technologies (ICTs) and the skills to use them are essential for inclusive development and digital participation. As Vietnam accelerates its digital transformation, ensuring that women are not left behind is critical to achieving the Sustainable Development Goals (SDGs), particularly SDG 5 (Gender Equality) and SDG 9 (Industry, Innovation, and Infrastructure). This study investigates the extent and socioeconomic patterning of digital exclusion among women in Vietnam.MethodsWe utilized nationally representative data from the 2021 Multiple Indicator Cluster Survey (MICS), which covered 10,770 women aged 15–49. Digital exclusion was defined in terms of (1) no ICT access (no use of computer, internet, or mobile phone in the past 3 months) and (2) no ICT skills (unable to perform any of nine standard digital tasks).ResultsResults show that 4.28% of women lacked digital access and 72.85% lacked digital skills. Inequalities were stark: access was lowest among ethnic minorities (19.55%) and the poorest quintile (17.10%), compared to 1.980.31% in the majority and richest groups. The digital skills gap was even wider, with 95.51% of the poorest women lacking ICT skills vs. 41.23% of the richest. Multivariable logistic regressions confirmed that ethnicity, wealth, rural residence, and older age were key predictors of exclusion.ConclusionThese findings underscore the urgent need for inclusive digital policies that extend beyond infrastructure to address gendered and socioeconomic barriers to digital literacy. Without targeted efforts, digital rollouts may widen existing inequalities and undermine SDG progress.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1677331
Parameter-efficient fine-tuning for low-resource text classification: a comparative study of LoRA, IA3, and ReFT
  • Dec 2, 2025
  • Frontiers in Big Data
  • Steve Nwaiwu

The successful application of large-scale transformer models in Natural Language Processing (NLP) is often hindered by the substantial computational cost and data requirements of full fine-tuning. This challenge is particularly acute in low-resource settings, where standard fine-tuning can lead to catastrophic overfitting and model collapse. To address this, Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a promising solution. However, a direct comparative analysis of their trade-offs under unified low-resource conditions is lacking. This study provides a rigorous empirical evaluation of three prominent PEFT methods: Low-Rank Adaptation (LoRA), Infused Adapter by Inhibiting and Amplifying Inner Activations (IA3), and a Representation Fine-Tuning (ReFT) strategy. Using a DistilBERT base model on low-resource versions of the AG News and Amazon Reviews datasets, the present work compares these methods against a full fine-tuning baseline across accuracy, F1 score, trainable parameters, and GPU memory usage. The findings reveal that while all PEFT methods dramatically outperform the baseline, LoRA consistently achieves the highest F1 scores (0.909 on Amazon Reviews). Critically, ReFT delivers nearly identical performance (~98% of LoRA's F1 score) while training only ~3% of the parameters, establishing it as the most efficient method. This research demonstrates that PEFT is not merely an efficiency optimization, but a necessary tool for robust generalization in data-scarce environments, providing practitioners with a clear guide to navigate the performance—efficiency trade-off. By unifying these evaluations under controlled conditions, this study advances beyond fragmented prior research and offers a systematic framework for selecting PEFT strategies.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1697478
Adaptive deep Q-networks for accurate electric vehicle range estimation
  • Nov 27, 2025
  • Frontiers in Big Data
  • Urvashi Khekare + 1 more

It is critical that electric vehicles estimate the remaining driving range after charging, as this has direct implications for drivers' range anxiety and thus for large-scale EV adoption. Traditional approaches to predicting range using machine learning rely heavily on large amounts of vehicle-specific data and therefore are not scalable or adaptable. In this paper, a deep reinforcement learning framework is proposed, utilizing big data from 103 different EV models from 31 different manufacturers. This dataset combines several operational variables (state of charge, voltage, current, temperature, vehicle speed, and discharge characteristics) that reflect highly dynamic driving states. Some outliers in this heterogeneous data were reduced through a hybrid fuzzy k-means clustering approach, enhancing the quality of the data used in training. Secondly, a pathfinder meta-heuristics approach has been applied to optimize the reward function of the deep Q-learning algorithm, and thus accelerate convergence and improve accuracy. Experimental validation reveals that the proposed framework halves the range error to [−0.28, 0.40] for independent testing and [−0.23, 0.34] at 10-fold cross-validation. The proposed approach outperforms traditional machine learning and transformer-based approaches in Mean Absolute Error (outperforming by 61.86% and 4.86%, respectively) and in Root Mean Square Error (outperforming by 6.36% and 3.56%, respectively). This highlights the robustness of the proposed framework under complex, dynamic EV data and its ability to enable scalable intelligent range prediction, which engenders innovation in infrastructure and climate conscious mobility.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1704189
M-PSGP: a momentum-based proximal scaled gradient projection algorithm for nonsmooth optimization with application to image deblurring
  • Nov 24, 2025
  • Frontiers in Big Data
  • Kexin Ning + 2 more

In this study, we focus on investigating a nonsmooth convex optimization problem involving the l1-norm under a non-negative constraint, with the goal of developing an inverse-problem solver for image deblurring. Research focused on solving this problem has garnered extensive attention and has had a significant impact on the field of image processing. However, existing optimization algorithms often suffer from overfitting and slow convergence, particularly when working with ill-conditioned data or noise. To address these challenges, we propose a momentum-based proximal scaled gradient projection (M-PSGP) algorithm. The M-PSGP algorithm, which is based on the proximal operator and scaled gradient projection (SGP) algorithm, integrates an improved Barzilai-Borwein-like step-size selection rule and a unified momentum acceleration framework to achieve a balance between performance optimization and convergence rate. Numerical experiments demonstrate the superiority of the M-PSGP algorithm over several seminal algorithms in image deblurring tasks, highlighting the significance of our improved step-size strategy and momentum-acceleration framework in enhancing convergence properties.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1686479
Enhanced SQL injection detection using chi-square feature selection and machine learning classifiers
  • Nov 19, 2025
  • Frontiers in Big Data
  • Emanuel Casmiry + 2 more

In the face of increasing cyberattacks, Structured Query Language (SQL) injection remains one of the most common and damaging types of web threats, accounting for over 20% of global cyberattack costs. However, due to its dynamic and variable nature, the current detection methods often suffer from high false positive rates and lower accuracy. This study proposes an enhanced SQL injection detection using Chi-square feature selection (FS) and machine learning models. A combined dataset was assembled by merging a custom dataset with the SQLiV3.csv file from the Kaggle repository. A Jensen–Shannon Divergence (JSD) analysis revealed moderate domain variation (overall JSD = 0.5775), with class-wise divergence of 0.1340 for SQLi and 0.5320 for benign queries. Term Frequency-Inverse Document Frequency (TF-IDF) was used to convert SQL queries into feature vectors, followed by the Chi-square feature selection to retain the most statistically significant features. Five classifiers, namely multinomial Naïve Bayes, support vector machine, logistic regression, decision tree, and K-nearest neighbor, were tested before and after feature selection. The results reveal that Chi-square feature selection improves classification performance across all models by reducing noise and eliminating redundant features. Notably, Decision Tree and K-Nearest Neighbors (KNN) models, which initially performed poorly, showed substantial improvements after feature selection. The Decision Tree improved from being the second-worst performer before feature selection to the best classifier afterward, achieving the highest accuracy of 99.73%, precision of 99.72%, recall of 99.70%, F1-score of 99.71%, a false positive rate (FPR) of 0.25%, and a misclassification rate of 0.27%. These findings highlight the crucial role of feature selection in high-dimensional data environments. Future research will investigate how feature selection impacts deep learning architectures, adaptive feature selection, incremental learning approaches, robustness against adversarial attacks, and evaluate model transferability across production web environments to ensure real-time detection reliability, establishing feature selection as a vital step in developing reliable SQL injection detection systems.

  • Open Access Icon
  • Research Article
  • 10.3389/fdata.2025.1617978
Robust detection framework for adversarial threats in Autonomous Vehicle Platooning
  • Nov 19, 2025
  • Frontiers in Big Data
  • Stephanie Ness

IntroductionThe study addresses adversarial threats in Autonomous Vehicle Platooning (AVP) using machine learning.MethodsA novel method integrating active learning with RF, GB, XGB, KNN, LR, and AdaBoost classifiers was developed.ResultsRandom Forest with active learning yielded the highest accuracy of 83.91%.DiscussionThe proposed framework significantly reduces labeling efforts and improves threat detection, enhancing AVP system security.