Articles published on Smirnov Statistic
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
275 Search results
Sort by Recency
- Research Article
- 10.1142/s1793962326500194
- Apr 25, 2026
- International Journal of Modeling, Simulation, and Scientific Computing
- Mehdi Shams + 1 more
Modeling of physical phenomena has been introduced by many physicists. In this paper, we explore some particle models with the Bose–Einstein distribution. The statistical inference of these distributions including moments, maximum likelihood estimation, Fisher information, entropy, and Kullback–Leibler divergence is analyzed. Given the maximum likelihood estimator and Fisher information estimate, we can use the Wald test to build confidence intervals and test hypotheses. Random sample generation based on the Metropolis–Hastings algorithm and the integral probability transformation theorem is investigated. We prefer to use the integral probability transformation theorem because of its speed and high accuracy. Also, for two independent random variables from the Bose–Einstein distributions, the distribution of their functions, as well as their moments, is examined. To evaluate the goodness-of-fit of the proposed models, we analyze real-world salary data across different levels. Graphical tools including histograms, empirical density functions, empirical cumulative distribution functions, and Q–Q plots are employed to visually compare the data with the fitted distributions. Bootstrap and permutation methods are used to compute the Kolmogorov–Smirnov statistic and the Kullback–Leibler divergence, while Wald, likelihood ratio, and score tests are applied for hypothesis testing. Comparisons with other common distributions (exponential, Weibull, log-normal, and gamma) are also conducted. The results consistently indicate that the Bose–Einstein type I distribution provides a substantially better fit to the empirical data compared to the Bose–Einstein type II distribution and the other models considered, as confirmed by residual analysis, descriptive measures, and formal test statistics. Finally, modeling with real data is carried out. In the conclusion section, suggestions are made for future research that can be conducted as a continuation of this study. Also, the R codes are reported in Appendix A.
- Research Article
- 10.3390/obesities6030025
- Apr 22, 2026
- Obesities
- Pedro Francke + 2 more
Background: School feeding programs aim to improve child nutrition, and they may influence weight outcomes insofar as program modalities and household responses alter children’s total energy intake. This is especially relevant in countries facing the double burden of malnutrition, where undernutrition and micronutrient deficiencies coexist with rising overweight and obesity. This study estimates the effect of Peru’s former National School Feeding Program on obesity and excess weight among children aged 36 to 59 months under a selection-on-observables identification strategy and assesses whether impacts differ across operational modalities, particularly breakfast-only versus breakfast plus lunch and ready-to-eat rations versus foods delivered for preparation. Methods: We use repeated cross-sectional microdata from the Demographic and Health Survey (ENDES) pooled over 2014 to 2018 and link them to administrative information. The sample includes 18,959 children aged 36 to 59 months. To improve comparability, we estimate propensity score weights targeting the average treatment effect on the treated (ATT) using a machine learning generalized boosted model (GBM), and assess covariate balance using standardized mean differences and Kolmogorov–Smirnov statistics. Identification assumes conditional independence given observed covariates and overlap (common support). Main estimates rely on weighted probit models with fixed effects, progressively adding exposure duration, modality indicators, and controls. Distributional effects are examined using quantile regression on the continuous weight-for-height z-score. Results: Without differentiating modalities, beneficiary status is not associated with a statistically significant change in obesity, while pooled baseline estimates indicate a statistically significant higher probability of excess weight. Modality-specific results show that obesity declines only when Qali Warma is delivered as breakfast plus lunch through products to be prepared (approximately −1.0 percentage point in parsimonious models and −0.4 percentage points after controls). Evidence for excess weight is directionally consistent by modality but less conclusive once controls are included. Conclusions: Qali Warma’s effects on early-childhood weight outcomes depend on implementation modality. Evaluations of school feeding programs should incorporate operational heterogeneity, particularly during program redesign.
- Research Article
- 10.3390/fire9040161
- Apr 11, 2026
- Fire
- Yuchen Du + 2 more
Identifying small-scale burn scars is critical for global carbon accounting, yet remains computationally challenging due to spectral complexity and ground truth scarcity in heterogeneous landscapes. Conventional deep learning models often fail to generalize in such environments, lacking both domain-specific priors and representative training distributions required for precise segmentation. Here, we show that optimizing the fine-tuning of the Prithvi Earth Foundation Model (EFM) via Multidimensional Latin Hypercube Sampling (LHS) establishes a robust framework for this task. Our comparative analysis reveals that the domain-adapted Prithvi model achieves a Mean Intersection over Union (mIoU) of 0.91, outperforming standard Vision Transformers (ViT) by 31.9% and significantly surpassing reconstruction-based architectures, such as Scale-MAE. We demonstrate that LHS is superior to Simple Random Sampling (SRS) for optimizing foundation models, as it ensures statistical fidelity with a Kolmogorov–Smirnov (KS) statistic below 0.1 and effectively captures the tail distributions of fire weather indices. Furthermore, our framework exhibited exceptional data efficiency, retaining 94.5% of its peak accuracy with only 100 training samples. These findings provide a scalable solution for monitoring small-scale disasters in data-constrained regions and validate the synergy between rigorous sampling strategies and EFMs.
- Research Article
- 10.1080/10485252.2026.2638780
- Mar 5, 2026
- Journal of Nonparametric Statistics
- Fang Fang + 2 more
Kolmogorov–Smirnov (KS) statistic has been widely used in many areas to evaluate the performance of binary classification. However, almost no classification algorithm tries to optimise it directly at the training stage due to the computational and theoretical challenges brought by the special form of KS. In this paper, we propose a novel Kolmogorov–Smirnov neural Network (KSNet) using KS as the optimisation objective. The difficulty of non-smoothness of the empirical KS is overcame by introducing a smooth nonconvex surrogate function. The KSNet brings great potential to improve the KS in test data especially for imbalanced data and it shows inspiring robustness to data noise. Theoretically, we establish the non-asymptotic excess risk bound of KSNet with a ReLU activated feedforward neural network and show its Bayes-risk consistency. Experiments on a variety of real datasets confirm the advantages of KSNet over a lot of existing methods.
- Research Article
- 10.1016/j.mtcomm.2026.115004
- Mar 1, 2026
- Materials Today Communications
- Faseeh Muhammad + 6 more
Designing high-temperature consolidated BaTiO 3 ceramics requires understanding coupled relationships among processing, microstructure, and functional properties. This study proposes an experimental–computational fusion workflow to predict four properties (dielectric constant, tangent loss, density, and ) from 8191 experimental records enriched with density functional theory (DFT)-informed descriptors. Missing experimental values are imputed using K-nearest neighbors (KNN) with domain-consistency checks to preserve physically plausible ranges. To mitigate cross-domain imbalance and expand structural descriptor coverage, the smaller computational descriptor set is augmented using CTGAN, followed by distributional-fidelity assessment using KL divergence, the Wasserstein distance, the Kolmogorov–Smirnov statistic, the overlap area, and correlation preservation. Feature relevance is screened using Pearson correlation and mutual information with redundancy pruning, and a TabTransformer regressor is trained under leakage-free preprocessing with five-fold cross-validation and an independent 10% hold-out test. Compared with experimental-only and computational-only learning (average of 0.7318 and 0.6684), the fused model achieves stable multi-property prediction with average , MAE=0.0206, and RMSE=0.0801. The results further show that applying feature filtering after fusion preserves interaction-dependent predictors, whereas early filtering can degrade tangent loss prediction. Overall, the proposed fusion framework enables accurate screening of processing outcomes using complementary experimental and physics-informed descriptors. • Integrates experimental and DFT-informed descriptors for BaTiO 3 ceramics. • Demonstrates that post-fusion feature selection improves multi-property prediction stability. • Shows interaction-sensitive targets (e.g., tangent loss) degrade under pre-fusion filtering. • Implements leakage-free 5-fold cross-validation with strict hold-out evaluation. • Establishes a fusion-first modeling strategy for processing–structure–property prediction.
- Research Article
- 10.3390/app16052232
- Feb 26, 2026
- Applied Sciences
- Konstantin Piryankov + 3 more
Ensuring the consistency of recurring ETL processes is a critical challenge in large-scale financial analytics, where upstream data changes—such as variable redefinitions, unit conversions (e.g., from days past due to number of overdue installments or currency changes), or erroneous submissions following source system updates—can silently degrade model reliability. These risks are amplified in automated modeling environments, where dozens of models are retrained monthly for each financial institution and the number of serviced institutions is expected to grow. This study presents an automated statistical monitoring framework for continuous quality assurance of monthly ETL outputs used in model development. The approach quantifies drift between a reference dataset and successive data deliveries using descriptive univariate and bivariate statistics combined with a normalized Canberra-based drift score, aggregated into interpretable variable-level stability measures. Sensitivity is evaluated through controlled noisification experiments with increasing Gaussian perturbations, demonstrating a monotonic decline in stability scores and consistent directional shifts in complementary metrics such as the Gini coefficient and Kolmogorov–Smirnov statistic. The results show that the framework effectively detects both subtle and large-scale distributional changes, providing a scalable, interpretable, and reproducible monitoring diagnostics suitable for fully automated financial data pipelines, with flexibility for extension.
- Research Article
- 10.1002/eng2.70660
- Feb 1, 2026
- Engineering Reports
- Rajitha C S + 5 more
ABSTRACT Over the years, hundreds of new statistical distributions have been developed, and the demand for new distributions is rising. In engineering and biomedical applications, lifespan data modeling has extensively utilized the Akash distribution. However, significant skewness, large tails, and non‐monotone hazard patterns, which are frequently seen in survival data, are difficult for it to depict due to its single‐shape construction. To overcome this limitation, this paper proposes the cubic rank transmuted Akash (CRTA) distribution, which modifies the Akash distribution using the cubic rank transmutation technique to create a new, adaptable statistical model for examining survival and death rates associated with different types of cancer. The proposed modification introduces two additional shape parameters, significantly enhancing the flexibility of the baseline Akash model. To guarantee the validity of the CRTA distribution, a corrected admissible parameter space is created. The probability density function, distribution function, hazard rate, and moments are among the essential mathematical characteristics of the CRTA distribution that are derived. Additionally, moments, order statistics, stochastic ordering, moment generating function, mode, and median of the new distribution are also discussed. Maximum likelihood estimation is used to estimate model parameters. To ensure the reliability of the parameter estimation method, a simulation study was implemented to assess its consistency. Using actual cancer remission‐time statistics, the CRTA distribution's performance is assessed and contrasted with that of the Akash distribution and other popular competing models. The suggested distribution offers a better fit to the cancer data sets, as indicated by log‐likelihood, AIC, BIC, and Kolmogorov–Smirnov statistics. These findings demonstrate that the cubic rank transmutation method yields substantial improvements in survival modeling. The proposed model provides a solid foundation for future extensions utilizing Bayesian and machine learning‐based estimation techniques, as well as a dependable statistical tool for biomedical survival analysis and engineering reliability.
- Research Article
- 10.3390/soilsystems10010017
- Jan 20, 2026
- Soil Systems
- Yacine Benhalima + 2 more
This study examined long-term changes in soil carbon stock dynamics 11 and 19 years after fire under different severities at 0–5 and 0–25 cm depths with a digital soil mapping approach. Linear (MLR) and non-linear models (RF, SVR, XGBoost) combined with feature selection methods (r < 0.8, FFS, Boruta) were used to predict bulk density (BD), total C, and C stock. Distributional biases were evaluated with Kolmogorov–Smirnov statistics and corrected by Quantile Mapping (QM). RF-FFS performed best for BD and total C at 0–5, while RF-SVR outperformed for C stock and all properties at 0–25. Total C was 49% higher at 0–5, whereas C stock was 7.57 times greater at 0–25. Both models underestimated variability, especially for C stock. At 0–25, bulk density decreased after fire, particularly under conditions of medium severity, while total C increased following the same tendency. The results showed that fire’s legacy is still present in the ecosystem after one and two decades. This is particularly evident at greater depths, where long-term C stock is lower.
- Research Article
- 10.63282/3050-9262.ijaidsml-v7i2p112
- Jan 1, 2026
- International Journal of Artificial Intelligence, Data Science, and Machine Learning
- Sai Prashanth Pathi
Credit risk models are central to lending decisions, capital allocation, and regulatory compliance at financial institutions worldwide. While model development and validation have been extensively studied, comparatively fewer works provide integrated frameworks for ongoing model monitoring that combine statistical metrics with governance structures. This paper presents a unified, hierarchically structured framework for model monitoring in credit risk, synthesising metrics across four dimensions: population stability, discriminatory power, calibration accuracy, and input variable stability. We formalise the Population Stability Index (PSI), Characteristic Stability Index (CSI), the Gini coefficient, Kolmogorov–Smirnov (KS) statistic, Area under the Receiver Operating Characteristic Curve (AUROC), and calibration-based metrics within a consistent mathematical notation. We further introduce a traffic-light governance overlay that maps metric thresholds to actionable escalation protocols, aligned with SR 11-7 and Basel II/III supervisory expectations. Empirical validation is conducted on a synthetic retail loan portfolio of 10,000 development observations and six quarterly production cohorts with programmatically controlled covariate and default rate drift. The logistic regression scorecard achieves a development AUROC of 0.9359 (Gini = 0.8717, KS = 0.7408), and the multi-dimensional monitoring dashboard correctly flags early calibration deterioration (Calibration Ratio reaching 0.70 at Q1) and sustained CSI drift (debt-to-income CSI = 0.946, num_inquiries CSI = 1.036 by Q6) while discriminatory power remains robust throughout. Our results demonstrate the non-redundancy of the four monitoring dimensions and support the adoption of multi-metric dashboards over single-indicator approaches. The proposed Integrated Credit Risk Monitoring Architecture (ICRMA) is designed to be accessible to practitioners at smaller institutions while remaining technically rigorous for model risk management professionals.
- Research Article
- 10.3389/fmed.2026.1833776
- Jan 1, 2026
- Frontiers in Medicine
- Wenyi Du + 7 more
BackgroundPostoperative heart failure (HF) represents a prevalent and serious complication among elderly surgical patients, markedly increasing perioperative morbidity, prolonging hospitalization, and elevating mortality risk. Early identification of high-risk individuals is therefore of substantial clinical importance for optimizing perioperative management. Conventional statistical models are inherently limited in capturing complex, nonlinear interactions among variables, whereas machine learning (ML) approaches offer distinct advantages in predictive performance and individualized risk stratification.MethodsIn this retrospective multicenter study, 1,562 elderly patients were consecutively enrolled from six hospitals, comprising 757 individuals in the internal cohort and 805 in the external validation cohort. Independent risk factors for postoperative HF were identified through univariate and multivariate logistic regression analyses. Five machine learning algorithms—extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), k-nearest neighbours (KNN), and multilayer perceptron (MLP)—were applied to rank feature importance. Variables consistently identified by both statistical and machine learning approaches were subsequently integrated into model development. The generalizability of the internal model was assessed using tenfold cross-validation. Model performance was comprehensively evaluated using receiver operating characteristic (ROC) curves, area under the curve (AUC), calibration plots, decision curve analysis (DCA), learning curves, Kolmogorov–Smirnov (KS) statistics, and confusion matrices. Model interpretability was further interrogated using SHapley Additive exPlanations (SHAP).ResultsPostoperative HF occurred in 66 patients (8.72%) within the internal cohort. Multivariate analysis and ML-based feature selection consistently identified sex, body mass index (BMI), diabetes mellitus, hypertension, hyperlipidaemia, history of malignancy, intraoperative tachycardia, and postoperative inflammatory markers (neutrophil-to-lymphocyte ratio (NLR) and C-reactive protein (CRP)) as key predictors. Among the models, XGBoost demonstrated superior performance, achieving an AUC of 0.979 in the training set, 0.937 in the internal validation set, and 0.92 in the external validation cohort. Tenfold cross-validation further confirmed robust generalizability (AUC = 0.933; accuracy = 0.908). SHAP analysis indicated that postoperative NLR and CRP made substantial contributions to model predictions, while individual-level SHAP visualizations further delineated the specific contributions of each variable to patient-level risk estimation.ConclusionWe developed and externally validated a machine learning–based predictive model for postoperative HF in elderly patients. The XGBoost model exhibited excellent discrimination, robust calibration, and promising clinical utility. SHAP-based interpretability analyses highlighted the pivotal contributions of inflammatory markers, metabolic comorbidities, and perioperative tachycardia, providing a reliable tool for individualized perioperative risk assessment in the elderly population.
- Research Article
- 10.3390/app152111652
- Oct 31, 2025
- Applied Sciences
- Cheng-Xi Li + 1 more
This study develops an integrated bi-level operations–assignment model to optimise express service on the Gyeongin Line, a core corridor connecting Seoul and Incheon. The upper level jointly selects express stops and time-of-day headways under coverage constraints—a minimum share of key stations and a maximum inter-stop spacing—while the lower level assigns passengers under user equilibrium using a generalised time function that incorporates in-vehicle time, 0.5× headway wait, walking and transfers, and crowding-sensitive dwell times. Undergrounding and alignment straightening are incorporated into segment run-time functions, enabling the co-design of infrastructure and operations. Using automatic-fare-collection-calibrated origin–destination matrices, seat-occupancy records, and station-area population grids, we evaluate five rail scenarios and one intermodal extension. The results indicate substantial system-wide gains: peak average door-to-door times fall by approximately 44–46% in the AM (07:00–09:00) and 30–38% in the PM (17:30–19:30) for rail-only options, and by up to 55% with the intermodal extension. Kernel density estimation (KDE) and cumulative distribution function (CDF) analyses show a leftward shift and tail compression (median −8.7 min; 90th percentile (P90) −11.2 min; ≤45 min share: 0.0% → 47.2%; ≤60 min: 59.7% → 87.9%). The 45-min isochrone expands by ≈12% (an additional 0.21 million residents), while the 60-min reach newly covers Incheon Jung-gu and Songdo. Backcasting against observed express/local ratios yields deviations near the ±10% band (PM one comparator within and one slightly above), and the Kolmogorov–Smirnov (KS) statistic and Mann–Whitney (MW) test results confirm significant post-implementation shifts. The most cost-effective near-term package combines mixed stopping with modest alignment and capacity upgrades and time-differentiated headways; the intermodal express–transfer scheme offers a feasible long-term upper bound. The methodology is fully transparent through provision of pseudocode, explicit convergence criteria, and all hyperparameter settings. We also report SDG-aligned indicators—traction energy and CO2-equivalent (CO2-eq) per passenger-kilometre, and jobs reachable within 45- and 60-min isochrones—providing indicative yet robust evidence consistent with SDG 9, 11, and 13.
- Research Article
- 10.3390/ai6110279
- Oct 23, 2025
- AI
- Khrystyna Shakhovska + 1 more
Objectives: This paper introduces an adaptive learning framework for handling concept drift in data by dynamically adjusting model updates based on the severity of detected drift. Methods: The proposed method combines multiple statistical measures to quantify distributional changes between recent and historical data windows. The resulting severity score drives a three-tier adaptation policy: minor drift is ignored, moderate drift triggers incremental model updates, and severe drift initiates full model retraining. Results: This approach balances stability and adaptability, reducing unnecessary computation while preserving model accuracy. The framework is applicable to both single-model and ensemble-based systems, offering a flexible and efficient solution for real-time drift management. Also, different transformation methods were reviewed, and quantile transformation was tested. By applying a quantile transformation, the Kolmogorov–Smirnov (KS) statistic decreased from 0.0559 to 0.0072, demonstrating effective drift adaptation.
- Research Article
1
- 10.3390/sym17071153
- Jul 18, 2025
- Symmetry
- Kenechukwu F Aforka + 5 more
This paper presents the exponentiated power shanker (EPS) distribution, a fresh three-parameter extension of the standard Shanker distribution with the ability to extend a wider class of data behaviors, from right-skewed and heavy-tailed phenomena. The structural properties of the distribution, namely complete and incomplete moments, entropy, and the moment generating function, are derived and examined in a formal manner. Maximum likelihood estimation (MLE) techniques are used for estimation of parameters, as well as a Monte Carlo simulation study to account for estimator performance across varying sample sizes and parameter values. The EPS model is also generalized to a regression paradigm to include covariate data, whose estimation is also conducted via MLE. Practical utility and flexibility of the EPS distribution are demonstrated through two real examples: one for the duration of repairs and another for HIV/AIDS mortality in Germany. Comparisons with some of the existing distributions, i.e., power Zeghdoudi, power Ishita, power Prakaamy, and logistic-Weibull, are made through some of the goodness-of-fit statistics such as log-likelihood, AIC, BIC, and the Kolmogorov–Smirnov statistic. Graphical plots, including PP plots, QQ plots, TTT plots, and empirical CDFs, further confirm the high modeling capacity of the EPS distribution. Results confirm the high goodness-of-fit and flexibility of the EPS model, making it a very good tool for reliability and biomedical modeling.
- Research Article
1
- 10.3390/sym17071034
- Jul 1, 2025
- Symmetry
- Faton Merovci
This paper introduces the record-based transmuted Rayleigh distribution of order 3 (rbt-R), a three-parameter extension of the classical Rayleigh model designed to address data characterized by high skewness and heavy tails. While traditional generalizations of the Rayleigh distribution enhance model flexibility, they often lack sufficient adaptability to capture the complexity of empirical distributions encountered in applied statistics. The rbt-R model incorporates two additional shape parameters, a and b, enabling it to represent a wider range of distributional shapes. Parameter estimation for the rbt-R model is performed using the maximum likelihood method. Simulation studies are conducted to evaluate the asymptotic properties of the estimators, including bias and mean squared error. The performance of the rbt-R model is assessed through empirical applications to four datasets: nicotine yields and carbon monoxide emissions from cigarette data, as well as breaking stress measurements from carbon-fiber materials. Model fit is evaluated using standard goodness-of-fit criteria, including AIC, AICc, BIC, and the Kolmogorov–Smirnov statistic. In all cases, the rbt-R model demonstrates a superior fit compared to existing Rayleigh-based models, indicating its effectiveness in modeling highly skewed and heavy-tailed data.
- Research Article
2
- 10.3390/math13091473
- Apr 30, 2025
- Mathematics
- Shijie Wang + 1 more
Under the rapid evolution of financial technology, traditional credit risk management paradigms relying on expert experience and singular algorithmic architectures have proven inadequate in addressing complex decision-making demands arising from dynamically correlated multidimensional risk factors and heterogeneous data fusion. This manuscript proposes an enhanced credit rating model based on an improved TabNet framework. First, the Kaggle “Give Me Some Credit” dataset undergoes preprocessing, including data balancing and partitioning into training, testing, and validation sets. Subsequently, the model architecture is refined through the integration of a multi-head attention mechanism to extract both global and local feature representations. Bayesian optimization is then employed to accelerate hyperparameter selection and automate a parameter search for TabNet. To further enhance classification and predictive performance, a stacked ensemble learning approach is implemented: the improved TabNet serves as the feature extractor, while XGBoost (Extreme Gradient Boosting), LightGBM (Light Gradient Boosting Machine), CatBoost (Categorical Boosting), KNN (K-Nearest Neighbors), and SVM (Support Vector Machine) are selected as base learners in the first layer, with XGBoost acting as the meta-learner in the second layer. The experimental results demonstrate that the proposed TabNet-based credit rating model outperforms benchmark models across multiple metrics, including accuracy, precision, recall, F1-score, AUC (Area Under the Curve), and KS (Kolmogorov–Smirnov statistic).
- Research Article
1
- 10.3847/1538-4357/adb420
- Mar 4, 2025
- The Astrophysical Journal
- Shuyi Meng + 1 more
Abstract This work quantitatively studies the differences of heavy ions among non-Alfvénic slow wind (N-ASSW), classical fast wind (FSW), and Alfvénic slow and hot wind (ASSW and AHSW) by effect size and Kolmogorov–Smirnov statistic. Statistics of ACE measurements in solar cycle 23 show that He/O and Fe/O in ASSW and AHSW are significantly similar to those in N-ASSW, but are distinct from FSW. The mean Fe16+/Fe13+ in ASSW (AHSW) is in the middle of the two mean values of FSW and N-ASSW. However, Fe16+/Fe13+ in the three categories of solar wind are similarly low during solar minimum. Charges of C and O ions in ASSW and AHSW are significantly different from those in FSW during solar minimum, but have obvious overlaps to those in FSW during solar maximum. Besides, the speed ratio of He2+, C5+, O6+, and Fe10+ against protons is studied, and that of O6+ in ASSW and AHSW is most like that in FSW. AHSW has two peaks of proton specific entropy showing negative and positive relations to electron temperature. The results support that ASSW, AHSW, and N-ASSW experience the same fractionation in the chromosphere. The heating in ASSW (AHSW) differs from N-ASSW between the freeze-in heights of O and Fe during solar maximum, possibly caused by interchange reconnection. The preferential acceleration of heavy ions may be proportional to the gyroradius scale. Dissipation of Alfvén waves to electrons may happen in AHSW.
- Research Article
1
- 10.1088/2632-2153/adb3ee
- Feb 27, 2025
- Machine Learning: Science and Technology
- Samuele Grossi + 2 more
Abstract We propose a robust methodology to evaluate the performance and computational efficiency of non-parametric two-sample tests, specifically designed for high-dimensional generative models in scientific applications such as in particle physics. The study focuses on tests built from univariate integral probability measures: the sliced Wasserstein distance and the mean of the Kolmogorov–Smirnov (KS) statistics, already discussed in the literature, and the novel sliced KS statistic. These metrics can be evaluated in parallel, allowing for fast and reliable estimates of their distribution under the null hypothesis. We also compare these metrics with the recently proposed unbiased Fréchet Gaussian distance and the unbiased quadratic Maximum Mean Discrepancy, computed with a quartic polynomial kernel. We evaluate the proposed tests on various distributions, focusing on their sensitivity to deformations parameterized by a single parameter ε. Our experiments include correlated Gaussians and mixtures of Gaussians in 5, 20, and 100 dimensions, and a particle physics dataset of gluon jets from the JetNet dataset, considering both jet- and particle-level features. Our results demonstrate that one-dimensional-based tests provide a level of sensitivity comparable to other multivariate metrics, but with significantly lower computational cost, making them ideal for evaluating generative models in high-dimensional settings. This methodology offers an efficient, standardized tool for model comparison and can serve as a benchmark for more advanced tests, including machine-learning-based approaches.
- Research Article
16
- 10.3390/sym17030341
- Feb 24, 2025
- Symmetry
- Xu Han + 4 more
With the proliferation of mobile devices and payment systems in modern financial services, there is an increasing need to process and analyze continuous streams of transaction data for credit risk assessment. Leveraging the inherent symmetries in financial markets and data structures, this paper introduces DeepCreditRisk, a symmetry-aware deep learning framework that addresses key challenges while maintaining critical invariance properties in financial data representation. The framework incorporates three main components: an adaptive temporal fusion mechanism, a heterogeneous graph neural network, and an attention-based interpretable output layer. The temporal fusion mechanism effectively models both short-term fluctuations and long-term trends in financial time series, while the heterogeneous graph neural network captures intricate relationships within the financial ecosystem. The framework maintains important symmetrical properties in both temporal and structural representations, ensuring balanced feature learning and invariant risk assessment. The attention-based output layer preserves representation symmetry while enhancing model interpretability. Extensive experiments on a large-scale credit risk dataset demonstrate DeepCreditRisk’s superior performance, achieving a 7.2% improvement in the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and an 18.6% improvement in the Kolmogorov–Smirnov (KS) statistic over state-of-the-art baseline models. The framework maintains high predictive power across various time horizons and provides interpretable insights into feature importance. DeepCreditRisk represents a significant advancement in applying deep learning to credit risk assessment, offering financial institutions a more accurate, robust, and transparent approach for evaluating creditworthiness and managing risk.
- Research Article
1
- 10.1785/0120240153
- Feb 19, 2025
- Bulletin of the Seismological Society of America
- Soung Eil Houng + 1 more
ABSTRACT Probabilistic seismic hazard analysis (PSHA) traditionally relies on two computationally intensive approaches: (a) Riemann Sum and (b) conventional Monte Carlo (MC) integration. The former requires fine slices across magnitude, distance, and ground motion, and the latter demands extensive synthetic earthquake catalogs. Both approaches become notably resource-intensive for low-probability seismic hazards, for which achieving a coefficient of variation (COV) of 1% for a 10−4 annual hazard probability may require 108 MC samples. We introduce adaptive importance sampling (AIS) PSHA, a novel framework to approximate optimal importance sampling (IS) distributions and dramatically reduce the number of MC samples to estimate hazards. We evaluate the efficiency and accuracy of our proposed framework using Pacific Earthquake Engineering Research Center PSHA benchmarks that cover various seismic sources, including areal, vertical, and dipping faults, as well as combined types. Our approach computes seismic hazard up to 3.7×104 and 7.1×103 times faster than Riemann Sum and traditional MC methods, respectively, maintaining COVs below 1%. We also propose an enhanced approach with a “smart” AIS PSHA variant that leverages the sampling densities from similar ground-motion intensities. This variant outperforms even “smart” implementations of Riemann Sum with enhanced grid discretizations by a factor of up to 130. Moreover, we demonstrate theoretically that optimal IS distributions are equivalent to hazard disaggregation distributions. Empirically, we show the approximated optimal IS and the disaggregation distributions are closely alike, for example, with a Kolmogorov–Smirnov statistic between 0.017 and 0.113. This approach is broadly applicable, especially for PSHA cases requiring extensive logic trees and epistemic uncertainty.
- Research Article
2
- 10.3847/1538-4357/ad93ba
- Jan 22, 2025
- The Astrophysical Journal
- Sangita Kumari + 5 more
Eclipses of radio emission have been reported for ∼58 spider millisecond pulsars (MSPs), of which only around 19% have been extensively studied. Such studies at low frequencies are crucial for probing the properties of the eclipse medium. This study investigates eclipses in 10 MSPs in compact orbit using wide-bandwidth observations with the Giant Metrewave Radio Telescope. We report the first evidence of eclipsing for PSR J2234+0944 and J2214+3000 in one epoch, while no evidence of eclipsing was observed in the subsequent two epochs, indicating temporal evolution of the eclipse cutoff frequency in these systems. Constraints on the eclipse cutoff frequency were obtained for PSR J1555–2908, J1810+1744, and J2051–0827. Moreover, for the first time, we detected an eclipse at a nonstandard orbital phase (∼0.5) for PSR J1810+1744, with a duration longer than the eclipse observed at superior conjunction. No eclipses were detected for PSR J0751+1807, J1738+0333, and J1807–2459A at 300–500 MHz and 550–750 MHz. We calculated the mass-loss rate of the companion for PSR J1555–2908 and PSR J1810+1744 and found that these rates are insufficient to ablate the companion stars. We cataloged the Ė/a2 , mass function, Roche lobe filling factor, and inclination angle for compact MSP binaries with low-mass companions and found that higher spin-down flux does not guarantee eclipses. Our analysis, supported by the Kolmogorov–Smirnov statistic, confirms that eclipsing black widow binaries generally exhibit a higher mass function compared to noneclipsing black widow binaries, consistent with previous studies.