Hierarchical nonparametric survival modeling for demand forecasting with fragmented categorical covariates

Ta‐Hsin Li

doi:10.1002/asmb.2459

Abstract

AbstractThis paper addresses the problem of data fragmentation when incorporating imbalanced categorical covariates in nonparametric survival models. The problem arises in an application of demand forecasting where certain categorical covariates are important explanatory factors for the diversity of survival patterns but are severely imbalanced in the sense that a large percentage of data segments defined by these covariates have very small sample sizes. Two general approaches, called the class‐based approach and the fusion‐based approach, are proposed to handle the problem. Both reply on judicious utilization of a data segment hierarchy defined by the covariates. The class‐based approach allows certain segments in the hierarchy to have their private survival functions and aggregates the others to share a common survival function. The fusion‐based approach allows all survival functions to borrow and share information from all segments based on their positions in the hierarchy. A nonparametric Bayesian estimator with Dirichlet process priors provides the data‐sharing mechanism in the fusion‐based approach. The hyperparameters in the priors are treated as fixed quantities and learned from data by taking advantage of the data segment hierarchy. The proposed methods are motivated and validated by a case study with real‐world data from an operation of software development service.

Full Text