Efficient distribution estimation for data with unobserved sub-population identifiers

Yanyuan Ma,Yuanjia Wang

doi:10.1214/12-ejs690

Yanyuan Ma, Yuanjia Wang

Open Access

PDF Available

https://doi.org/10.1214/12-ejs690

Copy DOI

Export

Save

Cite

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2012
Citations: 11	License type: cc-by

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

We study efficient nonparametric estimation of distribution functions of several scientifically meaningful sub-populations from data consisting of mixed samples where the sub-population identifiers are missing. Only probabilities of each observation belonging to a sub-population are available. The problem arises from several biomedical studies such as quantitative trait locus (QTL) analysis and genetic studies with ungenotyped relatives where the scientific interest lies in estimating the cumulative distribution function of a trait given a specific genotype. However, in these studies subjects' genotypes may not be directly observed. The distribution of the trait outcome is therefore a mixture of several genotype-specific distributions. We characterize the complete class of consistent estimators which includes members such as one type of nonparametric maximum likelihood estimator (NPMLE) and least squares or weighted least squares estimators. We identify the efficient estimator in the class that reaches the semiparametric efficiency bound, and we implement it using a simple procedure that remains consistent even if several components of the estimator are mis-specified. In addition, our close inspections on two commonly used NPMLEs in these problems show the surprising results that the NPMLE in one form is highly inefficient, while in the other form is inconsistent. We provide simulation procedures to illustrate the theoretical results and demonstrate the proposed methods through two real data examples.

Highlights

AMS 2000 subject classifications: Primary 62G05, 62G20; secondary 62G99
We provide nonparametric estimation in the sense that we do not make any distributional assumption on the conditional distributions
Comparing φeff with φOW LS obtained in Appendix A.2, we find that the optimal WLS (OWLS) is optimal among the WLS family, it does not reach the semiparametric efficiency bound

Summary

A class of weighted least squares estimators

The traditional approach to estimating F (t) is maximum likelihood estimator for a parametric model or NPMLE for a nonparametric model, a very simple weighted estimator can be used if we formulate the same problem from a different angle. Viewing the qi’s as covariates and I(Si ≤ t) as response variables, the covariates and the responses are linked by F (t) via a familiar linear regression model. Denote by M an arbitrary n × n diagonal matrix. En)T ∈ Rn. we obtain the general WLS estimator. The simplest estimator is the OLS where we set M = In, derived in Fine et al (2004) using a different formulation, while the most efficient WLS estimator is obtained when we assign M to be a diagonal matrix with the ith diagonal entry equals vi−1. Standard iteratively re-weighted estimation procedure can be used to obtain this optimal WLS (OWLS) estimator. The presence of the matrix M allows the flexibility to derive other WLS estimators to achieve desired properties such as robustness

The complete class of consistent estimators

The semiparametric efficient estimator

Analytic comparison between OWLS and the efficient estimator

Efficient estimator and its asymptotic properties

Algorithm for implementing the efficient estimator

Asymptotics and inferences

Understanding the NPMLEs

Three simulated examples

Estimation from QTL mapping data

Estimation from the LDL data

Discussion

Derivation of the complete influence function family

Influence function of the WLS

Proof of Theorem 1

Proof of Theorem 2

Findings

Inconsistency of the type II NPMLE

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Efficient distribution estimation for data with unobserved sub-population identifiers

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

An Adjustment to Improve the Bivariate Survivor Function Repaired NPMLE
F Zoe Moodie ... Ross L Prentice
Lifetime Data Analysis | VOL. 11
F Zoe Moodie, et. al.F Zoe Moodie ... Ross L Prentice
01 Sep 2005
Lifetime Data Analysis | VOL. 11

A consistent NPMLE of the joint distribution function with competing risks data under the dependent masking and right-censoring model.
Jiahui Li ... Qiqing Yu
Lifetime data analysis | VOL. 22
Jiahui Li, et. al.Jiahui Li ... Qiqing Yu
27 Aug 2014
Lifetime data analysis | VOL. 22

Comparison of Nonparametric Maximum Likelihood and Bayes Estimators of the Survival Function Based on Current Status Data
Hee-Jeong Kim ... Young-Sook Son
Communications for Statistical Applications and Methods | VOL. 14
Hee-Jeong Kim, et. al.Hee-Jeong Kim ... Young-Sook Son
30 Apr 2007
Communications for Statistical Applications and Methods | VOL. 14

Nonparametric Estimation of the Bivariate Survivor Function
Ross L Prentice ... Zoe Moodie
-
Ross L Prentice, et. al.Ross L Prentice ... Zoe Moodie
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Efficient distribution estimation for data with unobserved sub-population identifiers

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Electronic Journal of Statistics