Abstract

AbstractThis paper addresses the problem of data fragmentation when incorporating imbalanced categorical covariates in nonparametric survival models. The problem arises in an application of demand forecasting where certain categorical covariates are important explanatory factors for the diversity of survival patterns but are severely imbalanced in the sense that a large percentage of data segments defined by these covariates have very small sample sizes. Two general approaches, called the class‐based approach and the fusion‐based approach, are proposed to handle the problem. Both reply on judicious utilization of a data segment hierarchy defined by the covariates. The class‐based approach allows certain segments in the hierarchy to have their private survival functions and aggregates the others to share a common survival function. The fusion‐based approach allows all survival functions to borrow and share information from all segments based on their positions in the hierarchy. A nonparametric Bayesian estimator with Dirichlet process priors provides the data‐sharing mechanism in the fusion‐based approach. The hyperparameters in the priors are treated as fixed quantities and learned from data by taking advantage of the data segment hierarchy. The proposed methods are motivated and validated by a case study with real‐world data from an operation of software development service.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.