Abstract
AbstractWe propose a parametric kernel mode–based regression built on the mode value, which provides robust and efficient estimators for datasets containing outliers or heavy‐tailed distributions. To address the challenges posed by massive datasets, we integrate this regression method with distributed statistical learning techniques, which greatly reduces the required amount of primary memory and simultaneously accommodates heterogeneity in the estimation process. By approximating the local kernel objective function with a least squares format, we are able to preserve compact statistics for each worker machine, facilitating the reconstruction of estimates for the entire dataset with minimal asymptotic approximation error. Additionally, we explore shrinkage estimation through local quadratic approximation, showcasing that the resulting estimator possesses the oracle property through an adaptive LASSO approach. The finite‐sample performance of the developed method is illustrated using simulations and real data analysis.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have