Low-bias discrimination of circular data with measurement errors
Abstract We study nonparametric discrimination among circular density populations when sample data are affected by measurement errors. Relatively little research seems to have been devoted to this topic. Notoriously, in these problems, a nonparametric method needs to account for an additional source of bias due to the presence of measurement errors, beyond the usual bias typical of local methods. In the described context of abundant bias, we propose a deconvolution approach involving lower bias kernel estimators. Some asymptotic properties are discussed, and numerical results are provided along with a real data case study.
- Discussion
19
- 10.1086/687806
- Jul 1, 2016
- American Journal of Sociology
STILL SEARCHING FOR A TRUE RACE? REPLY TO KRAMER ET AL. AND ALBA ET AL.
- Research Article
195
- 10.1093/jnci/88.23.1738
- Dec 4, 1996
- JNCI Journal of the National Cancer Institute
International correlational analyses have suggested a strong positive association between fat consumption and breast cancer incidence, especially among post-menopausal women. However, case-control studies have been taken to indicate a weaker association, and a recent, pooled cohort analysis reported little evidence of an association. Differences among study results could be due to differences in the populations studied, differences in the control for total energy intake, recall bias in the case-control studies, and dietary measurement error biases. Existing measurement error models assume either that the sample data used to validate dietary self-report instruments are without measurements error or that any such error is independent of both the true dietary exposure and other study subject characteristics. However, growing evidence indicates that total energy and, presumably, both total fat and percent energy from fat are increasingly underreported as percent body fat increases. A relaxed dietary measurement model is introduced that allows all measurement error parameters to depend on body mass index (weight in kilograms divided by the square of height in meters) and incorporates a random underreporting quantity that applies to each dietary self-report instrument. The model was applied to results from international correlational analyses to determine whether the differing associations between dietary fat and postmenopausal breast cancer can be explained by measurement errors in dietary assessment. The relaxed measurement model was developed by use of data on total fat intake and percent energy from fat from 4-day food records (4DFRs) and food-frequency questionnaires (FFQs) from the original Women's Health Trial. This trial was a randomized, controlled, feasibility study of a low-fat dietary intervention carried out from 1985 through 1988 in Cincinnati (OH), Houston (TX), and Seattle (WA) among 303 women (184 intervention and 119 control) who were 45-69 years of age. The relaxed model was used to project results from the international correlational analyses onto 4DFR and FFQ fat-intake categories. If measurement errors in dietary assessment are overlooked entirely, the projected relative risks (RRs) for breast cancer based on the international data vary substantially across percentiles of total fat intake. The projected RR for the 90% versus the 10% fat-intake percentile is 3.08 with the 4DFR and 4.00 with the FFQ. If random (i.e., noise) aspects of measurement error are acknowledged, the projected RR for the same comparison is reduced to 1.54 with the 4DFR and 1.42 with the FFQ. If both systematic and noise aspects of measurement error are acknowledged, the projected RR is reduced to about 1.10 with either instrument. Acknowledgment of measurement error also leads to a projected RR of about 1.10 for the 90% versus the 10% percentile of percent energy from fat with either dietary instrument. Dietary self-report instruments may be inadequate for analytic epidemiologic studies of dietary fat and disease risk because of measurement error biases.
- Research Article
8
- 10.1080/09720510.2020.1759209
- Aug 5, 2020
- Journal of Statistics and Management Systems
In this paper, we have proposed three classes of almost unbiased estimators for population mean under simultaneous presence of measurement and non-response error. Asymptotic properties such as Bias and MSE for the proposed classes of estimators are obtained. Numerical illustration in support of theoretical results is also given on two real data sets and a simulated data set using R. Results indicate the superiority of proposed classes of estimators over existing estimators.
- Research Article
2
- 10.1002/bimj.201000180
- Jun 30, 2011
- Biometrical Journal
We consider the problem of jointly modeling survival time and longitudinal data subject to measurement error. The survival times are modeled through the proportional hazards model and a random effects model is assumed for the longitudinal covariate process. Under this framework, we propose an approximate nonparametric corrected-score estimator for the parameter, which describes the association between the time-to-event and the longitudinal covariate. The term nonparametric refers to the fact that assumptions regarding the distribution of the random effects and that of the measurement error are unnecessary. The finite sample size performance of the approximate nonparametric corrected-score estimator is examined through simulation studies and its asymptotic properties are also developed. Furthermore, the proposed estimator and some existing estimators are applied to real data from an AIDS clinical trial.
- Book Chapter
6
- 10.1007/978-94-017-1675-8_25
- Jan 1, 1997
Measurement error may be a large component of the total variation encountered in environmental variables. Therefore, for environmental analyses measurement error in sample data should be acknowledged and quantified. Further, it is important to understand how the variogram is affected by different kinds of measurement error. In this paper, a survey of field-based reflectance was undertaken in such a way as to allow the estimation of both the sample variogram γ z (h) and the component of the variogram due to measurement error γ e (h). The underlying variogram γ u (h) of true values u(x) was represented with a simple function. Then sequential Gaussian simulation was undertaken to introduce, via a locational error (δy), a measurement error e s (x) into the underlying true values u s (x) to give a set of ‘observed’ values z s (x). The results demonstrated the effects of locational error on the sample variogram γ zs (h), and in particular a negative cross-correlation between the underlying values and measurement error that results from a locational error. In addition, it is shown that the component of the sample variogram of reflectance due to measurement error γ e (h) could not be explained solely by a locational error δy.
- Research Article
1
- 10.1115/1.4030598
- Jul 16, 2015
- Journal of Medical Devices
Uncertainty Management in Computational Simulations of Medical Devices1
- Research Article
5
- 10.1002/sim.7858
- Jul 12, 2018
- Statistics in Medicine
It is important to properly correct for measurement error when estimating density functions associated with biomedical variables. These estimators that adjust for measurement error are broadly referred to as density deconvolution estimators. While most methods in the literature assume the distribution of the measurement error to be fully known, a recently proposed method based on the empirical phase function (EPF) can deal with the situation when the measurement error distribution is unknown. The EPF density estimator has only been considered in the context of additive and homoscedastic measurement error; however, the measurement error of many biomedical variables is heteroscedastic in nature. In this paper, we developed a phase function approach for density deconvolution when the measurement error has unknown distribution and is heteroscedastic. A weighted EPF (WEPF) is proposed where the weights are used to adjust for heteroscedasticity of measurement error. The asymptotic properties of the WEPF estimator are evaluated. Simulation results show that the weighting can result in large decreases in mean integrated squared error when estimating the phase function. The estimation of the weights from replicate observations is also discussed. Finally, the construction of a deconvolution density estimator using the WEPF is compared with an existing deconvolution estimator that adjusts for heteroscedasticity but assumes the measurement error distribution to be fully known. The WEPF estimator proves to be competitive, especially when considering that it relies on minimal assumption of the distribution of measurement error.
- Dissertation
- 10.21953/lse.1cadcpy4wuqm
- Apr 1, 2017
This thesis consists of three chapters which represent my journey as a researcher during this PhD. The uniting theme is nonparametric estimation and inference in the presence of data problems. The first chapter begins with nonparametric estimation in the presence of a censored dependent variable and endogenous regressors. For Chapters 2 and 3 my attention moves to problems of inference in the presence of mismeasured data. In Chapter 1 we develop a nonparametric estimator for the local average response of a censored dependent variable to endogenous regressors in a nonseparable model where the unobservable error term is not restricted to be scalar and where the nonseparable function need not be monotone in the unobservables. We formalise the identification argument put forward in Altonji, Ichimura and Otsu (2012), construct a nonparametric estimator, characterise its asymptotic property, and conduct a Monte Carlo investigation to study its small sample properties. We show that the estimator is consistent and asymptotically normally distributed. Chapter 2 considers specification testing for regression models with errors-in-variables. In contrast to the method proposed by Hall and Ma (2007), our test allows general nonlinear regression models. Since our test employs the smoothing approach, it complements the nonsmoothing one by Hall and Ma in terms of local power properties. We establish the asymptotic properties of our test statistic for the ordinary and supersmooth measurement error densities and develop a bootstrap method to approximate the critical value. We apply the test to the specification of Engel curves in the US. Finally, some simulation results endorse our theoretical findings: our test has advantages in detecting high frequency alternatives and dominates the existing tests under certain specifications. Chapter 3 develops a nonparametric significance test for regression models with measurement error in the regressors. To the best of our knowledge, this is the first test of its kind. We use a ‘semi-smoothing’ approach with nonparametric deconvolution estimators and show that our test is able to overcome the slow rates of convergence associated with such estimators. In particular, our test is able to detect local alternatives at the √n rate. We derive the asymptotic distribution under i.i.d. and weakly dependent data, and provide bootstrap procedures for both data types. We also highlight the finite sample performance of the test through a Monte Carlo study. Finally, we discuss two empirical applications. The first considers the effect of cognitive ability on a range of socio-economic variables. The second uses time series data - and a novel approach to estimate the measurement error without repeated measurements - to investigate whether future inflation expectations are able to stimulate current consumption.
- Research Article
73
- 10.1080/01621459.2012.751872
- Dec 3, 2012
- Journal of the American Statistical Association
Virtually all methods aimed at correcting for covariate measurement error in regressions rely on some form of additional information (e.g., validation data, known error distributions, repeated measurements, or instruments). In contrast, we establish that the fully nonparametric classical errors-in-variables model is identifiable from data on the regressor and the dependent variable alone, unless the model takes a very specific parametric form. This parametric family includes (but is not limited to) the linear specification with normally distributed variables as a well-known special case. This result relies on standard primitive regularity conditions taking the form of smoothness constraints and nonvanishing characteristic functions’ assumptions. Our approach can handle both monotone and nonmonotone specifications, provided the latter oscillate a finite number of times. Given that the very specific unidentified parametric functional form is arguably the exception rather than the rule, this identification result should have a wide applicability. It leads to a new perspective on handling measurement error in nonlinear and nonparametric models, opening the way to a novel and practical approach to correct for measurement error in datasets where it was previously considered impossible (due to the lack of additional information regarding the measurement error). We suggest an estimator based on non/semiparametric maximum likelihood, derive its asymptotic properties, and illustrate the effectiveness of the method with a simulation study and an application to the relationship between firm investment behavior and market value, the latter being notoriously mismeasured. Supplementary materials for this article are available online.
- Single Report
- 10.1920/wp.cem.2012.4012
- Dec 3, 2012
Virtually all methods aimed at correcting for covariate measurement error in regressions rely on some form of additional information (e.g. validation data, known error distributions, repeated measurements or instruments). In contrast, we establish that the fully nonparametric classical errors-in-variables mode is identifiable from data on the regressor and the dependent variable alone, unless the model takes a very specific parametric form. The parametric family includes (but is not limited to) the linear specification with normally distributed variables as a well-known special cast. This result relies on standard primitive regularity conditions taking the form of smoothness constraints and nonvanishing characteristic functions assumptions. Our approach can handle both monotone and nonmonotone specifications, provided the latter oscillate a finite number of times. Given that the very specific unidentified parametric functional form is arguably the exception rather than the rule, this identification result should have a wide applicability. It leads to a new perspective on handling measurement error in nonlinear and nonparametric models, opening the way to a novel and practical approach to correct for measurement error in data sets where it was previously considered impossible (due to the lack of additional information regarding the measurement error). We suggest an estimator based on non/semi-parametric maximum likelihood, derive its asymptotic properties and illustrate the effectiveness of the method with a simulation study and an application to the relationship between firm investment behaviour and market value, the latter being notoriously mismeasured.
- Research Article
16
- 10.1080/01621459.1990.10474978
- Dec 1, 1990
- Journal of the American Statistical Association
Increasing attention is being given to measurement error models in which the dimension of the proxy or surrogate values is different from that of the missing true values. Even if they are of the same dimension, the error model may not be the simple additive one of observed = true + error, where the error has mean 0. The use of broader models relating true and observed values requires the use of external or internal data containing some true values to calibrate/validate the measurement error model. This article considers a multivariate normal framework in which measurement error is allowed in any subset of the variables with a broad class of multivariate regression models relating true and observed values. The classical additive model is a special case. Multiple regression with random regressors (the structural case) is treated within this framework. Correcting for measurement error is made possible through double sampling in which true values are obtained for a randomly chosen subset of the main study units. Maximum likelihood estimators and their asymptotic properties are developed for both unrestricted and restricted models, where the latter arise through specific assumptions about the nature of the measurement error. Detailed results are given for simple linear regression in which case optimal double sampling rates are determined for estimating the slope with minimum variance subject to cost considerations. An example is presented based on the use of infrared measurements as surrogates for characteristics of wheat.
- Research Article
15
- 10.3233/jem-2007-0283
- May 1, 2007
- Journal of Economic and Social Measurement
We analyze measurement and classification errors in several key variables in a matched sample of survey and administrative longitudinal data on employees spanning 1994–2001. The data are much more comprehensive than usually seen in validation studies. Measurement errors in earnings are found to be much larger than reported in previous studies limited to one single firm. A key finding is that employees who attrite from the panel report their earnings significantly less accurate than individuals who are observed throughout the entire sampling period.
- Research Article
3
- 10.2139/ssrn.937346
- Jan 1, 2006
- SSRN Electronic Journal
In this paper, we analyze measurement and classification errors in several key variables, including earnings and educational attainment, in a matched sample of survey and administrative longitudinal data. The data, spanning 1994-2001 and covering all sectors in the Danish economy, are much more comprehensive than usually seen in validation studies. Measurement errors in earnings are found to be much larger than reported in previous studies limited to one single firm. Individuals who attrite from the panel report their earnings significantly less accurate than individuals who are observed throughout the entire sampling period. Furthermore, females are found to report their earnings significantly more precise than males, part-time workers report significantly less accurate than full-time workers and low-income workers report significantly less accurate than workers with relatively higher income. Classification errors in categorical variables are found to be of about the same magnitude as previously found in the literature. We analyze whether response error in one variable makes it more likely that the same respondent will report other variables with error but do not find support for this hypothesis.
- Research Article
- 10.2139/ssrn.500502
- Feb 14, 2004
- SSRN Electronic Journal
In recent years the application of kernel smoothing methods in nonparametric regression framework to financial time-series analysis has become widespread. Kernel smoothing methods have not been applied, however, to a wide range of problems arising in time-series simulations and forecasting. Forgetting factors can be both fixed and variable. Gijbels et al (1999) propose an understanding of fixed forgetting factors via kernel smoothing. However the variable forgetting approach is not mentioned. This paper describes the variable forgetting factor and the fixed forgetting factor, and establishes the linkage for the first time between the variable forgetting factor approach and kernel smoothing. The forgetting factor method uses a sample of data and estimates the value of the forgetting factor from the sample. This method fits better than a parametric approach, which uses some assumed parameters. Since the forgetting factor method is equivalent to a kernel estimation - which is a non-parametric method - it is likely to give more accurate estimates and better forecasting performance in financial time-series than a parametric one. The major area of interest in this application is whether kernel estimation, using Cho's approach [see Cho et al (1991) and Brailsford et al (2002)] for kernel bandwidth selection, can improve the Euro's forecasting performance within the framework of subset AR modelling. The forecasting performance is compared with the performance of AR modelling without the use of the forgetting factor. If improved forecasting performance is achieved, this can increase the potential use of kernel smoothing methods in time-series forecasting. The findings show that the kernel bandwidth so determined can improve the forecasting performance.
- Research Article
4
- 10.34172/jrhs152053
- Jun 24, 2015
- Journal of Research in Health Sciences
Use of single measurement of risk factors can distort their estimated effects, due to random error in measurements. The aim of this study was to examine the extent of underestimation in the estimated effect of common variables in physical exam i.e. systolic and diastolic blood pressure (SBP, DBP) and body mass index (BMI) on cardiovascular diseases in Tehran Lipid and Glucose Study (TLGS). A subsample (1167 men and 1786 women) of the original cohort, who had replicate measures of the variables in triennial interval, was used to calculate the regression dilution ratios (RDRs) in men and women. RDRs were determined by parametric and nonparametric methods. Hazard ratios (HR) of risk factors, per one standard deviation change, were corrected for regression dilution bias. The estimated RDRs by parametric method in men and women were 45% and 35% for SBP and 54% and 64% for DBP respectively. There were 26% and 25% underestimation in HR of SBP and 23% and 33% in HR of DBP in men and women. The corresponding underestimation for BMI was about 8%. RDRs of men and women and in age groups by both methods were fairly similar. They were relatively constant during the 10-year follow-up for SBP and BMI. Using baseline measurements of blood pressure underestimate its real association with CVD events and the estimated HRs. The underestimations are independent of age and sex, and it can be fairly constant in short to moderate time intervals.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.