Abstract

A new estimation method for the two-component mixture model introduced in [29] is proposed. This model consists of a two-component mixture of linear regressions in which one component is entirely known while the proportion, the slope, the intercept and the error distribution of the other component are unknown. In spite of good performance for datasets of reasonable size, the method proposed in [29] suffers from a serious drawback when the sample size becomes large as it is based on the optimization of a contrast function whose pointwise computation requires $O(n^{2})$ operations. The range of applicability of the method derived in this work is substantially larger as it relies on a method-of-moments estimator free of tuning parameters whose computation requires $O(n)$ operations. From a theoretical perspective, the asymptotic normality of both the estimator of the Euclidean parameter vector and of the semiparametric estimator of the c.d.f. of the error is proved under weak conditions not involving zero-symmetry assumptions. In addition, an approximate confidence band for the c.d.f. of the error can be computed using a weighted bootstrap whose asymptotic validity is proved. The finite-sample performance of the resulting estimation procedure is studied under various scenarios through Monte Carlo experiments. The proposed method is illustrated on three real datasets of size $n=150$, 51 and 176,343, respectively. Two extensions of the considered model are discussed in the final section: a model with an additional scale parameter for the first component, and a model with more than one explanatory variable.

Highlights

  • MSC 2010 subject classifications: Primary 62J05; secondary 62G08

  • A new estimation method for the two-component mixture model introduced in [29] is proposed. This model consists of a two-component mixture of linear regressions in which one component is entirely known while the proportion, the slope, the intercept and the error distribution of the other component are unknown

  • In spite of good performance for datasets of reasonable size, the method proposed in [29] suffers from a serious drawback when the sample size becomes large as it is based on the optimization of a contrast function whose pointwise computation requires O(n2) operations

Read more

Summary

Problem and notation

Let Z be a Bernoulli random variable with unknown parameter π0 ∈ [0, 1], let X be an X -valued random variable with X ⊂ R, and let ε∗, ε∗∗ be two absolutely continuous centered real valued random variables with finite variances and independent of X. As shall be discussed, it is possible to consider a slightly more general version of the model stated in (2) involving an unknown scale parameter for the first component. This more elaborate model remains identifiable and estimation through the method of moments is theoretically possible. From a practical perspective, estimation of this scale parameter through the method of moments seems quite unstable insomuch as that an alternative estimation method appears to be required Notice that another more straightforward extension of the model will be considered in Section 7 allowing to deal with more than one explanatory variable

Identifiability
Estimation
Estimation of the Euclidean parameter vector
X2 0 0 0
X2 0 0 0 0 0 0
Estimation of the functional parameter
A weighted bootstrap with application to confidence bands for F
Monte Carlo experiments
Illustrations
The tone dataset
The aphids dataset
The NimbleGen high density array dataset
Conclusion and possible extensions of the model
An additional unknown scale parameter for the first component
More than one explanatory variable
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call