In many semiparametric models, the infinite-dimensional parameter of direct interest is a probability density, but its nonparametric estimation is usually difficult in the presence of incomplete data. To address this issue, this study promotes phase-type distributions as a method of sieve. Phase-type distributions are dense in the space of nonnegative distributions, closed under minimum, maximum, and convolution, and compatible with the accelerated failure time model. This renders them attractive for sieve density estimation for problems with sophisticated missing data. However, the class of phase-type distributions is over-parameterized, and its approximation error rate for a given density as the number of phases increases remains unknown. These handicaps hinder its theoretical development as a sieve method. In this paper, we design a sieve class of identifiable phase-type densities and establish its approximation error rate for a given density, which is the first error rate result known for phase type distributions. The proposed sieve is then used for semiparametric M-estimation, where the nonparametric component is a density. Building on the approximation error rate results, we establish general asymptotic properties of the phase-type sieve estimators and apply them to models that have complicated data missingness and cannot be easily handled by existing methods. Our simulation and real data analysis focus on right-censored data with missing indicators, in which we demonstrate that our estimators are more efficient than existing estimators.
Read full abstract