Articles published on Nonparametric density estimation
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1304 Search results
Sort by Recency
- Research Article
- 10.1016/j.csda.2025.108335
- May 1, 2026
- Computational Statistics & Data Analysis
- Isaac Diaz-Ray + 3 more
Density or intensity function estimation for point pattern data observed on complex domains finds wide applications in spatial data analysis. However, many existing popular density estimation methods face challenges when domains have irregular boundaries, line network structures, sharp concavities, or interior holes. A nonparametric Bayesian additive ensemble of spanning trees model is developed to model the distribution of event occurrences on complex domains. This model uses a random spanning tree weak learner, which can produce flexible and contiguous domain partitions while respecting its geometry and constraints. The method has the advantage of capturing both varying smoothness and sharp changes in density functions. An efficient exact likelihood-based Bayesian inference algorithm is proposed to estimate the density function with uncertainty measures, leveraging a data thinning strategy combined with Poisson-Gamma conjugacy. Simulation studies on various complex domains demonstrate the advantages of the proposed model over competing methods. The method is further applied to the analysis of basketball shot data and crime locations on a road network.
- Research Article
- 10.3390/su18073626
- Apr 7, 2026
- Sustainability
- Mingyue Zhang + 3 more
Promoting pesticide reduction is a key step toward green vegetable production and ecological safety. Based on survey data collected from 356 leek growers in Weifang City—the largest facility-based vegetable production base in Shandong Province—this study empirically estimates the ecological compensation standard associated with pesticide-reduction behavior. The estimation employs a contingent valuation method (CVM) using non-parametric kernel density estimation for conditional value assessment, combined with the Heckman two-step model to address potential sample selection bias. The results show that 79.3% of respondents are willing to participate in an eco-compensation program for pesticide reduction; the main reason for refusal is “the higher reduction costs and lower profits”. The expected compensation level ranges from 614.94 to 620.57 yuan per mu (1 mu is approximately 0.165 acres) per year. Gender, share of Chinese chives (Allium tuberosum) income, trust in extension agents, and government penalties for excessive spraying significantly raise the required compensation, whereas age and knowledge of eco-compensation significantly lower it. Therefore, a sustainable compensation scheme co-driven by government and market should be established, combining cash, technical and in-kind support, and adopting tiered compensation schemes that reflect different reduction intensities.
- Research Article
- 10.3390/pr14060984
- Mar 19, 2026
- Processes
- Yuejiao Wang + 6 more
The large-scale grid integration of distributed renewable energy enhances the flexible regulation capacity of the power system. However, the inherent randomness and volatility of its output, coupled with weak coupling access characteristics, pose severe challenges to the safe and stable operation of the power system. To address these issues, this paper proposes a power system planning method suitable for urban power grids. To accurately characterize the uncertainty of renewable energy output, the method incorporates the concept of multi-scenario stochastic optimization and introduces a dynamic scenario generation method for wind and solar power based on nonparametric kernel density estimation and standard multivariate normal distribution sequence sampling. This method generates a set of typical daily dynamic output scenarios for wind and solar power that closely match actual output characteristics. Considering the spatiotemporal response characteristics of flexible resources, the Soft Open Point (SOP) DC link enables flexible cross-node power transmission and spatiotemporal coupling regulation of flexible resources. Therefore, this paper constructs a mathematical model for the grid integration of flexible resources based on the SOP DC link. By integrating operational constraints such as power flow constraints in the power grid and source-load uncertainty constraints, a power system planning model is established. However, traditional convex optimization methods require approximate simplifications of the model, which can easily lead to a loss of accuracy. Although the Particle Swarm Optimization (PSO) algorithm is suitable for nonlinear optimization, it is prone to getting trapped in local optima. Therefore, this paper introduces an improved PSO algorithm based on refraction opposite learning, which enhances the algorithm’s global optimization capability by expanding the particle search space and increasing population diversity. Finally, simulation verification is conducted based on an improved IEEE-39 bus test system, and the results show that the proposed scenario generation method achieves a sum of squared errors of only 4.82% and a silhouette coefficient of 0.94, significantly improving accuracy compared to traditional methods such as Monte Carlo sampling.
- Research Article
- 10.1080/10618600.2026.2648594
- Mar 18, 2026
- Journal of Computational and Graphical Statistics
- Martin Burda + 1 more
Bayesian nonparametric density estimation procedures are typically based on single-scale priors, such as Dirichlet process mixtures. Alternative multiscale density priors built on decision trees have many well-known advantages, including the ability to characterize abrupt local changes and to provide an estimate with a desired level of resolution. Despite their theoretical appeal, multiscale methods have typically been developed in the literature as univariate. Their multivariate versions are generally costly to implement in applications due to rapidly increasing number of mixture components. We propose a random Bernstein polynomial prior on the unit hypercube of arbitrary dimension with a spike-and-slab shrinkage structure. The prior induces posterior sparsity of the multiscale decision tree, alleviating the curse of dimensionality. We embed the proposed model in the form of a copula link function along with nonparametric marginals in a composite prior over general spaces of densities. We provide conditions for posterior consistency under the weak topology and assess the finite-sample properties in a simulation study. We further illustrate the practical use of the model in an application to forecasting the Value at Risk and Expected Shortfall of a financial portfolio in a scenario where sampling from the non-sparse posterior would be infeasible. Supplemental materials for this article are available online.
- Research Article
- 10.1088/1742-6596/3198/1/012001
- Mar 1, 2026
- Journal of Physics: Conference Series
- Han Gong + 5 more
Abstract Roughly capturing the general trend of load changes leads to large prediction errors. To address this, a power system power prediction error optimization model based on a self-supervised learning temporal convolutional network is proposed. A feature extraction unit containing modules such as self-attention is constructed to implicitly represent historical power interaction information through multi-dimensional transformation and calculation. Various operations are combined, and a mutual information maximization unit is constructed. After preprocessing and training, latent temporal features of power are mined. Based on a temporal convolutional network integrating multiple convolutions, an attention mechanism and autoregressive components are introduced, integrating linear and nonlinear components. The Dropout method is used to reduce overfitting, thus constructing a power prediction model. The nonparametric statistical kernel density estimation method is used to model the deviation probability distribution, calculate the deviation interval probability, analyze the effective deviation, and then adjust the model parameters accordingly, completing the construction of the power system power prediction error optimization model. Experiments show that the predicted curve of the proposed method has a high degree of fit with the actual curve. Key indicators such as MAE, RMSE, and MAPE are all controlled within 2%. Compared with the comparison methods, its residual curve fluctuates more closely around the zero value with small fluctuations, and the R 2 value is significantly higher, significantly reducing the prediction error.
- Research Article
- 10.1109/tits.2026.3656019
- Mar 1, 2026
- IEEE Transactions on Intelligent Transportation Systems
- Zhanru Liu + 5 more
Understanding the fine-grained trajectories of metro passengers, especially at the train and route levels, is essential for analyzing system-level dynamics and individual behavior. However, existing approaches often rely on strong behavioral priors or simplified boarding assumptions, limiting their generality and realism. This study proposes a fully data-driven framework for passenger trajectory inference that explicitly incorporates physical capacity constraints and crowding effects. Entry, transfer, and egress walking durations are modeled using non-parametric Kernel Density Estimation (KDE) at the platform level. Based on these distributions, we construct a confidence-based model to estimate the probability of each feasible itinerary. A congestion-aware penalty function is introduced to reduce the confidence of infeasible itineraries involving overloaded in-vehicle links. To balance inference accuracy and computational efficiency, we develop a dynamic batch-size adjustment algorithm that iteratively updates train loads and refines probabilities. The framework is validated using large-scale AFC and timetable data from Chengdu Metro. Results demonstrate that the proposed method effectively suppresses violations of physical capacity constraints, improves behavioral plausibility, and provides reliable inputs for downstream applications such as resilience analysis and passenger behavior modeling.
- Research Article
- 10.32996/jmss.2026.7.2.2
- Feb 16, 2026
- Journal of Mathematics and Statistics Studies
- Reuben Lang'At
Nonparametric methods for estimating probability densities are popular because they provide flexible tools for exploratory analysis, model checking, and inference when little is known about the underlying distributional form. In the context of sample surveys where data arise from complex designs involving stratification, clustering, and unequal inclusion probabilities, naive application of standard nonparametric estimators can, however, produce biased and inconsistent results. This paper reviews foundations of nonparametric density estimation and use of kernel and local polynomial methods and discusses their adaptation to design-based and model-based survey frameworks. Practical implementation issues involving bandwidth selection, boundary correction, and computational considerations are made. Throughout, emphasis is placed on methods that respect survey design information, and on trade-offs between design-based validity and model-based efficiency. The paper concludes with recommendations for practice and directions for future research.
- Research Article
- 10.1007/s00357-025-09535-0
- Jan 27, 2026
- Journal of Classification
- Pierpaolo D’Urso + 3 more
Abstract This paper proposes clustering methods for large-scale stationary time series using a fuzzy approach. Adopting partitioning around centroids (PAC) and partitioning around medoids (PAM), and focusing on distributional properties of individual series, we classify a large set of time series by transforming the series into probability density functions via nonparametric density estimation, such as the kernel estimation, and using a proper distance measure, such as the Hellinger distance, between density functions. We use simulations and two real applications to demonstrate the good performance and effectiveness of the proposed clustering methods in finite samples. The proposed methods are also applicable to the spectral density functions if one focuses on the serial dependence of individual series.
- Research Article
- 10.3390/math14020315
- Jan 16, 2026
- Mathematics
- Meng Han + 2 more
We explore the role of carbon convenience yields in forecasting the probability density of carbon returns. While theory suggests that convenience yields contain forward-looking information, their predictive content for carbon returns—especially in a density forecasting framework—remains underexplored. We propose a probability density forecasting approach that combines a mixed data sampling (MIDAS) regression with a non-parametric bootstrap and kernel density estimation. Using data from the European carbon market, we find that convenience yields significantly predict carbon returns. It takes approximately 19 days for a disturbance in carbon convenience yields to affect carbon returns, with the impact persisting for around 27 days. Moreover, our approach outperforms existing benchmark models in predicting the probability density of carbon returns, showing superior predictive accuracy and robustness.
- Research Article
- 10.1111/jfr3.70165
- Jan 4, 2026
- Journal of Flood Risk Management
- Jayesh Parmar + 4 more
ABSTRACT China has complex topography, diverse flood mechanisms, and high population exposure, making it highly vulnerable to flooding, highlighting the need for robust national‐scale hazard assessments to identify flood‐prone regions. However, most existing hazard studies are limited to regional scales or rely on empirical indicator‐based methods that overlook flood dynamics. While some global‐scale studies use physics‐based modeling, they offer little insight into China and rarely consider reservoir operations. This study advances national‐scale flood hazard mapping for China using the hydrodynamic Global Flood Model, CaMa‐Flood (v4.2). Simulations driven by ERA5‐Reanalysis runoff showed stronger agreement with observed streamflow than ERA5‐Land. Flood frequency analysis identified the nonparametric Kernel Density Estimator as the most suitable approach. The resulting 0.05° flood hazard maps reveal that nearly half of mainland China faces some level of 1‐in‐100‐year flood hazard, with 26% in the high to very high category. Incorporating reservoir operations reduced the number of national high hazard areas by up to 31%, underscoring their vital role in mitigation. The derived hazard, population exposure, and GDP‐based analysis provide a data‐driven foundation for national and provincial flood risk management, offering a scalable framework for robust hazard assessment and improved exposure and flood risk evaluation.
- Research Article
- 10.1214/26-ejs2501
- Jan 1, 2026
- Electronic Journal of Statistics
- Yuki Takazawa + 1 more
The inference of evolutionary histories is a central problem in evolutionary biology. The analysis of a sample of phylogenetic trees can be conducted in Billera–Holmes–Vogtmann tree space, which is a CAT(0) metric space of phylogenetic trees. The globally non-positively curved (CAT(0)) property of this space enables the extension of various statistical techniques. In the problem of nonparametric density estimation, two primary methods, kernel density estimation and log-concave maximum likelihood estimation, have been proposed, yet their theoretical properties remain largely unexplored. In this paper, we address this gap by proving the consistency of these estimators in a more general setting—CAT(0) orthant spaces, which include BHV tree space. We extend log-concave approximation techniques to this setting and establish consistency via the continuity of the log-concave projection map. We also modify the kernel density estimator to correct boundary bias and establish uniform consistency using empirical process theory.
- Research Article
- 10.63561/jmsc.v2i4.1054
- Dec 30, 2025
- Faculty of Natural and Applied Sciences Journal of Mathematical and Statistical Computing
- Simeon Uyovwieyovwe Ejakpovi + 3 more
Maritime piracy prevalence is a global phenomenon that anchors mostly on the illegal lifting of crude oil in the Nigeria waters. It is an illegal act of depredation, committed for isolated ends by the crews of private sea vessels which has degraded the economic growth of the Nigeria economy. This research aims to examine the hazardous effects of piracy in the Niger Delta region (NDR) using novel nonparametric density estimation methods on the nation’s economy. The maritime piracy data showed that there are five-hundred and ninety-four (594) menace incidence in the NDR that negatively affected the annual gross domestic product (GDP) and annual incomes in million Dollars in the Nigeria economy within the period of 2002-2024. The positive economic growth index was recorded in 2002 and negative growth rates recorded in 2016 and 2020 by both model estimators. The kernel density estimator (KDE) and Hermite Series kernel density estimator (HSeKDE) Mathematical model estimators’ visualizations of both economic growth rates and maritime piracy incidence hinged on the model estimators smoothing parameter and the kernel functions. In contrast to the KDE model visualizations, the HSeKDE estimator through the Gaussian kernel and others captured the multiple undulating patterns of piracy incidence persistence and rippled economic growth rates. This was possible with the aid of the smoothing parameter and kernel functions applied in the model estimator to ascertain the influx in the Nigeria economy. Also, the asymptotic mean integrated squared error (AMISE) is the error criterion whose least value of 0.00000125586 and 0.0000460329 were obtained for maritime piracy incidence, while that of economic growth rate is 0.0000010743 and 0.000039378 using the Epanechnikov and Gaussian kernels. These AMISE values obtained from the sea theft incidence affirmed its occurrence in the Nigeria seas while that of the economic growth rates affirmed the indemnities of the incessant sea theft in the Nigeria waters have adversely affected the country’s economic growth rate stability. Therefore, we recommend that fund appropriation should be allocated to foster the training of personnel and purchase of modern robotic security equipments to halt all forms of sea theft in the Niger Delta region (NDR).
- Research Article
- 10.1080/00401706.2025.2582628
- Dec 22, 2025
- Technometrics
- Qianhan Zeng + 4 more
We study anomaly detection in images under a fixed-camera environment and propose a doubly smoothed (DS) density estimator that exploits spatial structure to improve estimation accuracy. The DS estimator applies kernel smoothing twice: first over the value domain to obtain location-wise classical nonparametric density (CD) estimates, and then over the spatial domain to borrow information from neighboring locations. Under appropriate regularity conditions, we show that the DS estimator achieves smaller asymptotic bias, variance, and mean squared error than the CD estimator. To address the increased computational cost of the DS estimator, we introduce a grid point approximation (GPA) technique that reduces the computation cost of inference without sacrificing the estimation accuracy. A rule-of-thumb bandwidth is derived for practical use. Extensive simulations show that GPA-DS achieves the lowest MSE with near real-time speed. In a large-scale case study on underground mine surveillance, GPA-DS enables remarkable sub-image extraction of anomalous regions after which a lightweight MobileNet classifier achieves ≈ 99% out-of-sample accuracy for unsafe act detection.
- Research Article
1
- 10.1007/s11600-025-01762-8
- Dec 19, 2025
- Acta Geophysica
- Francis Tong + 2 more
Abstract Frequent significant deviations of the observed magnitude distribution of anthropogenic seismicity from the Gutenberg–Richter relation require alternative magnitude–frequency models for probabilistic seismic hazard assessments. Five nonparametric kernel density estimation (KDE) methods are evaluated on simulated samples drawn from four magnitude distribution models: the exponential, concave and convex bi-exponential, and exponential-Gaussian distributions. The studied KDE methods include Silverman’s and Scott’s rules with Abramson’s bandwidth adaptation, two diffusion-based methods (ISJ and diffKDE), and adaptiveKDE, which formulates the bandwidth estimation as an optimization problem. Their performance is assessed for magnitudes from 2 to 6 with sample sizes of 400 to 5000, using the mean integrated square error of cumulative distribution (MISE F ) over 100,000 simulations. Their suitability in hazard assessments is illustrated by the mean of the mean return period (MRP) for a sample size of 1000. Among the tested methods, diffKDE provides the most accurate cumulative distribution function estimates for larger magnitudes. Even when the data are drawn from an exponential distribution, diffKDE performs comparably to maximum likelihood estimation when the sample size is at least 1000. Given that anthropogenic seismicity often deviates from the exponential model, using diffKDE for probabilistic seismic hazard assessments is recommended whenever a sufficient sample size is available.
- Research Article
- 10.1080/07474938.2025.2591339
- Dec 10, 2025
- Econometric Reviews
- Federico Zincenko
. Considering a continuous random variable Y together with a continuous random vector X, I propose a nonparametric estimator f ̂ ( ⋅ | x ) for the conditional density of Y given X = x. This estimator takes the form of an exponential series whose coefficients θ ̂ x = ( θ ̂ x , 1 , … , θ ̂ x , J ) are the solution of a system of nonlinear equations that depends on an estimator of the conditional expectation E [ ϕ ( Y ) | X = x ] , where ϕ is a J-dimensional vector of basis functions. The distinguishing feature of the proposed estimator is that E [ ϕ ( Y ) | X = x ] is estimated by generalized random forest (Athey, Tibshirani, and Wager, Annals of Statistics, 2019), targeting the heterogeneity of θ ̂ x across x. I show that f ̂ ( ⋅ | x ) is uniformly consistent and asymptotically normal, allowing J→∞. I also provide a standard error formula to construct asymptotically valid confidence intervals. Results from Monte Carlo experiments are provided, and an empirical application to U.S. timber auction data illustrates how the proposed estimator can be used to estimate the conditional density of bids given auctioned object characteristics.
- Research Article
- 10.1038/s41598-025-30090-6
- Dec 5, 2025
- Scientific Reports
- Jiaqi Li + 6 more
Optimizing the scheduling of integrated electric-heat systems (IEHS) is complex due to fluctuating user-side loads and their associated uncertainties. To address this, this paper proposes an integrated demand response (DR) optimization strategy for IEHS that accounts for load uncertainty. First, a probabilistic model leveraging Copula functions was formulated to capture the temporal correlation of load uncertainties. A non-parametric Kernel Density Estimation method was then employed to fit the load distribution, and randomized load fluctuation data were generated using Monte Carlo sampling to simulate uncertainty. Second, a DR model that incorporates the characteristics of the electric-heat system is introduced. The electrical and heating load are coordinated through distinct energy storage devices. Finally, the effectiveness of the strategy is validated through the application of an improved column-and-constraint generation algorithm. Simulation outcomes indicate that the presented optimization approach substantially improves the operational flexibility and performance of IEHS.
- Research Article
1
- 10.1016/j.ijtst.2024.11.005
- Dec 1, 2025
- International Journal of Transportation Science and Technology
- Lubna Obaid + 2 more
Incident Duration Reliability Assessment Using Monte-Carlo Simulation and Kernel Density Estimation of Machine Learning-Based Models
- Research Article
- 10.1038/s41598-025-30135-w
- Nov 25, 2025
- Scientific reports
- Jie Zhang + 2 more
This paper proposes a precise line loss rate probability density estimation method using the Bilateral Total Variation (BTV) filtering algorithm to suppress noise while preserving edge information in power data. The BTV algorithm smooths noise by considering spatial distribution, maintaining edge gradients, and improving data accuracy. The line loss rate is calculated using a combined improved equivalent resistance method with the filtered data, followed by non-parametric kernel density estimation for precise probability density results. Experiments show the method effectively filters power data, enabling accurate line loss rate calculation and reliable density estimation. Under varying conditions, the method achieves a high maximum Kendall correlation coefficient (lowest ≈ 0.88), confirming its accuracy in reflecting the true line loss rate distribution.
- Research Article
- 10.3390/pr13113635
- Nov 10, 2025
- Processes
- Guifen Jiang + 9 more
The high proportion of renewable energy introduces significant operation risks to the system’s flexibility balance due to its volatility and randomness. Traditional regulation methods struggle to meet the urgent demand for flexible resources. Utilizing wind turbines (WTs) under load shedding operation can provide additional reserve capacity, thereby reducing the risk of insufficient system flexibility. However, since wind speed and turbine output exhibit a cubic relationship, minor fluctuations in wind speed can lead to significant variations in output and reserve capacity. This increases the uncertainty in the supply of flexible resources from WTs, posing challenges to power system flexibility assessment. This paper investigates a method for assessing power system flexibility considering the uncertainty of flexible resources supported by WT under load shedding operation. Firstly, according to the flexibility supply control model of WT under shedding operation, the analytical relationship between output, flexible resources, and wind speed under a specific wind energy conversion coefficient is constructed; secondly, combined with the probabilistic model of wind speed based on the nonparametric kernel density estimation, the wind turbine flexible resource uncertainty model is constructed; thirdly, the Monte Carlo simulation is used to obtain the sampled wind speed data, and the operational flexibility assessment method of the power system considering the flexibility uncertainty of WT under load shedding operation is proposed. Finally, through case studies, the validity of the proposed model and method were verified. The analysis concludes that load shedding operation of WTs can enhance the system’s flexible resources to a certain extent but cannot provide stable bi-directional regulation capabilities.
- Research Article
1
- 10.1016/j.jbi.2025.104937
- Nov 1, 2025
- Journal of biomedical informatics
- Hugo Álvarez-Chaves + 1 more
Interpretable statistical modeling of patient flow in emergency departments.