- New
- Research Article
- 10.1007/s44199-025-00140-z
- Oct 10, 2025
- Journal of Statistical Theory and Applications
- R L Manogna + 1 more
- New
- Research Article
- 10.1007/s44199-025-00144-9
- Oct 10, 2025
- Journal of Statistical Theory and Applications
- Helder Rojas + 2 more
Abstract Information Value (IV) is a widely used technique for feature selection prior to the modeling phase, particularly in credit scoring and related domains. However, conventional IV-based practices rely on fixed empirical thresholds, which lack statistical justification and may be sensitive to characteristics such as class imbalance. In this work, we develop a formal statistical framework for IV by establishing its connection with Jeffreys divergence and propose a novel nonparametric hypothesis test, referred to as the J-Divergence test. Our method provides rigorous asymptotic guarantees and enables interpretable decisions based on $$p$$ p -values. Numerical experiments, including synthetic and real-world data, demonstrate that the proposed test is more reliable than traditional IV thresholding, particularly under strong imbalance. The test is model-agnostic, computationally efficient, and well-suited for the pre-modeling phase in high-dimensional or imbalanced settings. An open-source Python library is provided for reproducibility and practical adoption.
- Research Article
- 10.1007/s44199-025-00130-1
- Oct 6, 2025
- Journal of Statistical Theory and Applications
- Abouzar Hemmati + 3 more
- Research Article
- 10.1007/s44199-025-00137-8
- Oct 6, 2025
- Journal of Statistical Theory and Applications
- Sukanta Dash + 3 more
- Research Article
- 10.1007/s44199-025-00139-6
- Sep 9, 2025
- Journal of Statistical Theory and Applications
- Lars Lindhagen + 2 more
Abstract Latent subgroups arise when patients are randomized to an intended treatment, that can only be given for certain, treatable, patients. For biological efficacy, the relevant estimand is then the treatment effect in the subgroup of treatable patients, with the obvious issue that this subgroup is latent, identified only in the intervention arm. We present a modular framework for effect estimation in such latent subgroups. The framework consists of a core and three plug-in models, for subgroup membership and outcomes among treatable and non-treatable patients. The core computes maximum likelihood estimates using the EM algorithm, together with standard errors. It does so without any knowledge about the details of the plug-in models, giving the user great flexibility. The methods are implemented in an package. The framework is validated in a simulation, where we also explore the use of predictors. Particularly intriguing are predictors of treatability, partly identifying the latent subgroup from baseline data. The results suggest that this can dramatically increase the power, while being robust against model misspecifications. Finally, the methods are applied to a prostate cancer trial.
- Research Article
- 10.1007/s44199-025-00128-9
- Jul 7, 2025
- Journal of Statistical Theory and Applications
- Ayman Baklizi
- Research Article
- 10.1007/s44199-025-00129-8
- Jul 3, 2025
- Journal of Statistical Theory and Applications
- Song-Kyoo Kim
- Research Article
- 10.1007/s44199-025-00126-x
- Jun 30, 2025
- Journal of Statistical Theory and Applications
- Muhammad Aslam
- Research Article
- 10.1007/s44199-025-00122-1
- Jun 23, 2025
- Journal of Statistical Theory and Applications
- Indira Puteri Kinasih + 2 more
Abstract The modelling of property prices has been extensively studied in econometrics, with widely used approaches including generalised linear regression and geographically weighted regression. These models commonly address local spatial correlations observed in property price data. However, despite its potential to capture spatial effects, the conditional autoregressive (CAR) model remains underutilised for this purpose. This study examines the robustness and predictive power of the CAR model, comparing it with established spatial models across three different datasets generation. An illustrative case study on Lombok house price data is also included. Simulation results showed that the CAR model demonstrates a distinct advantage, achieving lower bias and variability compared to other spatial regression models, effectively capturing neighbourhood-based spatial relationships, and exhibiting strong predictive power across various scenarios. In the Lombok case study, the CAR model outperformed other models, providing more precise estimates for property-related factors such as land size and built-up area. The results confirm that CAR’s spatial framework enables a nuanced analysis of property values across regions, enhancing accuracy in predictive models. This study also reveals the distinct strengths and limitations of each model, offering insights into their predictive accuracy and applicability across diverse real estate contexts.
- Research Article
- 10.1007/s44199-025-00124-z
- Jun 1, 2025
- Journal of Statistical Theory and Applications
- Michael R Powers + 1 more
Mixture distributions provide a versatile and widely used framework for modeling random phenomena, and are particularly well-suited to the analysis of geoscientific processes and their attendant risks to society. For continuous mixtures of random variables, we specify a simple criterion—generating-function accessibility—to extend previously known kernel-based identifiability (or unidentifiability) results to new kernel distributions. This criterion, based on functional relationships between the relevant kernels’ moment-generating functions or Laplace transforms, may be applied to continuous mixtures of both discrete and continuous random variables. To illustrate the proposed approach, we present results for several specific kernels, in each case briefly noting its relevance to research in the geosciences and/or related risk analysis.