Abstract

We study efficient nonparametric estimation of distribution functions of several scientifically meaningful sub-populations from data consisting of mixed samples where the sub-population identifiers are missing. Only probabilities of each observation belonging to a sub-population are available. The problem arises from several biomedical studies such as quantitative trait locus (QTL) analysis and genetic studies with ungenotyped relatives where the scientific interest lies in estimating the cumulative distribution function of a trait given a specific genotype. However, in these studies subjects' genotypes may not be directly observed. The distribution of the trait outcome is therefore a mixture of several genotype-specific distributions. We characterize the complete class of consistent estimators which includes members such as one type of nonparametric maximum likelihood estimator (NPMLE) and least squares or weighted least squares estimators. We identify the efficient estimator in the class that reaches the semiparametric efficiency bound, and we implement it using a simple procedure that remains consistent even if several components of the estimator are mis-specified. In addition, our close inspections on two commonly used NPMLEs in these problems show the surprising results that the NPMLE in one form is highly inefficient, while in the other form is inconsistent. We provide simulation procedures to illustrate the theoretical results and demonstrate the proposed methods through two real data examples.

Highlights

  • AMS 2000 subject classifications: Primary 62G05, 62G20; secondary 62G99

  • We provide nonparametric estimation in the sense that we do not make any distributional assumption on the conditional distributions

  • Comparing φeff with φOW LS obtained in Appendix A.2, we find that the optimal WLS (OWLS) is optimal among the WLS family, it does not reach the semiparametric efficiency bound

Read more

Summary

A class of weighted least squares estimators

The traditional approach to estimating F (t) is maximum likelihood estimator for a parametric model or NPMLE for a nonparametric model, a very simple weighted estimator can be used if we formulate the same problem from a different angle. Viewing the qi’s as covariates and I(Si ≤ t) as response variables, the covariates and the responses are linked by F (t) via a familiar linear regression model. Denote by M an arbitrary n × n diagonal matrix. En)T ∈ Rn. we obtain the general WLS estimator. The simplest estimator is the OLS where we set M = In, derived in Fine et al (2004) using a different formulation, while the most efficient WLS estimator is obtained when we assign M to be a diagonal matrix with the ith diagonal entry equals vi−1. Standard iteratively re-weighted estimation procedure can be used to obtain this optimal WLS (OWLS) estimator. The presence of the matrix M allows the flexibility to derive other WLS estimators to achieve desired properties such as robustness

The complete class of consistent estimators
The semiparametric efficient estimator
Analytic comparison between OWLS and the efficient estimator
Efficient estimator and its asymptotic properties
Algorithm for implementing the efficient estimator
Asymptotics and inferences
Understanding the NPMLEs
Three simulated examples
Estimation from QTL mapping data
Estimation from the LDL data
Discussion
Derivation of the complete influence function family
Influence function of the WLS
Proof of Theorem 1
Proof of Theorem 2
Findings
Inconsistency of the type II NPMLE

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.