Small area estimation of poverty indicators under bivariate Fay–Herriot model with correlated time effects
Abstract This paper presents an area-level temporal bivariate linear mixed model, incorporating correlated time effects for estimating socioeconomic indicators in small areas. The model is applied through the residual maximum likelihood method, leading to the derivation of empirical best linear unbiased predictors for these indicators. Additionally, an approximation of the mean square error matrix (MSE) is provided and four MSE estimators are proposed. The first estimator involves a plug-in approach to the MSE approximation, while the remaining estimators are based on parametric bootstrap procedures. To assess the performance of the fitting algorithm, predictors, and MSE estimators, three simulation experiments are carried out. An application to real data from the 2016 to 2022 Spanish Living Conditions Survey is conducted. The focus is on estimating poverty proportions and gaps for the year 2022, categorized by provinces and sex.
- Research Article
17
- 10.1007/s10260-020-00521-x
- Mar 31, 2020
- Statistical Methods & Applications
This paper introduces a temporal bivariate area-level linear mixed model with independent time effects for estimating small area socioeconomic indicators. The model is fitted by using the residual maximum likelihood method. Empirical best linear unbiased predictors of these indicators are derived. An approximation to the matrix of mean squared errors (MSE) is given and four MSE estimators are proposed. The first MSE estimator is a plug-in version of the MSE approximation. The remaining MSE estimators rely on parametric bootstrap procedures. Three simulation experiments designed to analyze the behavior of the fitting algorithm, the predictors and the MSE estimators are carried out. An application to real data from the 2005 and 2006 Spanish living conditions survey illustrate the introduced statistical methodology. The target is the estimation of 2006 poverty proportions and gaps by provinces and sex.
- Research Article
2
- 10.1007/s00122-024-04639-4
- May 16, 2024
- Theoretical and Applied Genetics
The standard approach to variance component estimation in linear mixed models for alpha designs is the residual maximum likelihood (REML) method. One drawback of the REML method in the context of incomplete block designs is that the block variance may be estimated as zero, which can compromise the recovery of inter-block information and hence reduce the accuracy of treatment effects estimation. Due to the development of statistical and computational methods, there is an increasing interest in adopting hierarchical approaches to analysis. In order to increase the precision of the analysis of individual trials laid out as alpha designs, we here make a proposal to create an objectively informed prior distribution for variance components for replicates, blocks and plots, based on the results of previous (historical) trials. We propose different modelling approaches for the prior distributions and evaluate the effectiveness of the hierarchical approach compared to the REML method, which is classically used for analysing individual trials in two-stage approaches for multi-environment trials.
- Research Article
4
- 10.1080/03610918.2013.809102
- Oct 23, 2014
- Communications in Statistics - Simulation and Computation
Recently, an empirical best linear unbiased predictor is widely used as a practical approach to small area inference. It is also of interest to construct empirical prediction intervals. However, we do not know which method should be used from among the several existing prediction intervals. In this article, we first obtain an empirical prediction interval by using the residual maximum likelihood method for estimating unknown model variance parameters. Then we compare the later with other intervals with the residual maximum likelihood method. Additionally, some different parametric bootstrap methods for constructing empirical prediction intervals are also compared in a simulation study.
- Research Article
15
- 10.1111/j.1463-5224.2005.00339.x
- May 1, 2005
- Veterinary Ophthalmology
We analyzed the prevalence of the presumed inherited eye diseases (PIED) noncongenital cataract and progressive retinal atrophy in the Entlebucher Mountain Dog for systematic environmental influences and the additive genetic variation. Multivariate linear animal models using residual maximum likelihood methods and multivariate threshold animal models using Gibbs sampling in Bayesian analyses were used to estimate variance and covariance components. Data were obtained from the kennel club for Swiss Mountain Dog breeds in Germany. PIED were recorded using the standardized protocols of the Dortmunder Kreis, the German panel of the European Eye Scheme for Diagnosis of Inherited Eye Diseases in Animals (DOK). The material included 515 Entlebucher Mountain Dogs from 344 litters at 77 different kennels. Veterinary diagnoses for PIED were from the years 1981-2001. Pedigree information was available for up to nine generations. The multivariate animal model regarded the fixed effects of sex, birth year, experience of the veterinary ophthalmologist, litter size, percentage of examined dogs per litter, inbreeding coefficient and age at examination. The common environment of the litter and the additive genetic effect of the animal were taken into account as randomly distributed effects. The heritability estimates for PIED in the Entlebucher Mountain Dog were h2=0.15+/-0.06 (noncongenital cataract), and h2=0.34+/-0.08 (progressive retinal atrophy) in the linear model and h2=0.32+/-0.05 (noncongenital cataract) and h2=0.59+/-0.03 (progessive retinal atrophy) in the threshold model. The additive genetic correlation between noncongenital cataract and progressive retinal atrophy was moderately positive (r(g)=0.54+/-0.08) in the threshold model. The number of examinations performed by the veterinary ophthalmologists was associated with slightly higher heritabilities for noncongenital cataract and considerably higher heritabilities for progressive retinal atrophy. The investigated PIED in the Entlebucher Mountain Dog are genetically influenced and the size of the genetic parameters estimated may be sensitive to the accuracy of the diagnosis and how the data were collected.
- Research Article
13
- 10.1002/env.2136
- Feb 21, 2012
- Environmetrics
We consider approaches for calculating and mapping statistical predictions of soil organic carbon (SOC), and attendant uncertainty, from data across a region of France. The data were collected from farms across the region. To protect the anonymity of farms that contributed, the locations and values of individual observations were unavailable, and we were only able to use the average value, sample variance, and number of observations from each commune. Communes varied in size up to a maximum of 130 km 2, with a mean of 10 km 2. The uncertainty due to data being commune‐wide averages—with sample error varying between communes as a result of variations in their size and the number of samples drawn from within them—raises an important methodological issue. We show how a residual maximum likelihood method can be used to estimate covariance parameters on the basis of this form of data and use the empirical best linear unbiased predictor to calculate predictions. Cross‐validation shows that by properly representing the commune‐wide averaged data, the predictions and attendant uncertainty assessments are more reliable than those from a naïve approach based on the summary means only. We compare maps produced using the approaches showing the SOC predictions and the attendant uncertainty. Copyright © 2012 John Wiley & Sons, Ltd.
- Research Article
8
- 10.1080/00949650902766860
- Jul 1, 2010
- Journal of Statistical Computation and Simulation
The empirical best linear unbiased prediction approach is a popular method for the estimation of small area parameters. However, the estimation of reliable mean squared prediction error (MSPE) of the estimated best linear unbiased predictors (EBLUP) is a complicated process. In this paper we study the use of resampling methods for MSPE estimation of the EBLUP. A cross-sectional and time-series stationary small area model is used to provide estimates in small areas. Under this model, a parametric bootstrap procedure and a weighted jackknife method are introduced. A Monte Carlo simulation study is conducted in order to compare the performance of different resampling-based measures of uncertainty of the EBLUP with the analytical approximation. Our empirical results show that the proposed resampling-based approaches performed better than the analytical approximation in several situations, although in some cases they tend to underestimate the true MSPE of the EBLUP in a higher number of small areas.
- Research Article
121
- 10.1016/j.csda.2012.09.002
- Sep 10, 2012
- Computational Statistics & Data Analysis
Small area estimation with spatio-temporal Fay–Herriot models
- Research Article
1
- 10.1093/jas/skae374
- Dec 9, 2024
- Journal of animal science
Proteolytic fermentation induces negative effects on gut health and function, which may affect pig performance. The objective was to conduct a meta-analysis to develop an index of dietary indigestible dietary protein (IDP) to investigate growth performance outcomes of mixed-sex weanling pigs (average body weight of 7.59kg). Eighty-nine articles reporting growth performance variables [average daily gain (ADG), average daily feed intake (ADFI), gain:feed ratio (GF), initial (IBW), and final body weight] in pigs fed different dietary protein (DP) content (from 12% to 33.6%) and protein sources (plant and animal) were included. DP and IDP index was calculated in all experiments using a common database, with the IDP index defined as the difference between total DP and standardized ileal digestible DP. A DP- and an IDP-based model were developed to predict the ADG, GF, and ADFI (by their relationship) of weaning pigs using a multivariable linear mixed model regression approach with estimates of variable effects obtained using the residual maximum likelihood method. Based on a stepwise manual forward selection, significant predictor variables with improvement of at least 2 points in the Bayesian information criterion were included in the final regression model. Statistical significance was set at P ≤ 0.05 and a trend at P < 0.10. Initial exploratory analysis of the database showed a quadratic increase (P < 0.01) in the IDP index with increasing inclusion of plant protein sources in diet formulation and a linear decrease (P < 0.01) in the IDP index with increasing synthetic amino acid inclusion. Regarding the models, the DP-based model could not account for the inclusion of protein sources compared to the IDP-based model. There was a tendency for DP to positively affect (P < 0.10) ADG and GF. Increasing the IDP index tended to negatively impact (P < 0.10) ADG while reducing (P < 0.05) ADFI. Using a practical and hypothetical feed formulation simulation, the final regression models predicted the expected negative impact of a high IDP index on newly weaned pig performance when compared to a low IDP diet. The IDP-based model predicted a stronger negative effect of high IDP when compared to the DP-based model. Results indicate that IDP may be an improved and more reliable index to investigate the impact of DP on pig performance in the postweaning phase.
- Research Article
7
- 10.2134/agronj2007.0112
- May 1, 2008
- Agronomy Journal
This is a discussion paper that presents no new material but challenges the way that field trials with changing treatment variances have been traditionally analyzed. We argue that one should always expect the variance of yield to change when the yields are obtained from plots with different plant densities. To illustrate, a turnip (Brassica rapa L.) sowing density by sowing date experiment is analyzed using analysis of variance and residual maximum likelihood methods. Deviance is used to compare the statistical models and demonstrate that residual maximum likelihood provides a better analysis when a linear mixed model is fitted to account for a changing variance due to sowing density. The analysis is further improved when sowing date, which also has a changing variance, is incorporated into the model. Plant density trials should always be assumed to have changing variance. Linear mixed models (with a residual maximum likelihood algorithm for estimating variance parameters) can be used to obtain superior analyses and make better research decisions.
- Research Article
8
- 10.1080/00949655.2019.1590578
- Mar 14, 2019
- Journal of Statistical Computation and Simulation
ABSTRACTData from past time periods and temporal correlation are rich sources of information for estimating small area parameters at the current period. This paper investigates the use of unit-level temporal linear mixed models for estimating linear parameters. Two models are considered, with domain and domain-time random effects. The first model assumes time independency and the second one AR(1)-type time correlation. They are fitted by a Fisher-scoring algorithm that calculates the residual maximum likelihood estimators of the model parameters. Based on the introduced models, empirical best linear unbiased predictors of small area linear parameters are studied, and analytic estimators for evaluating the performance of their mean squared errors are proposed. Three simulation experiments are carried out to study the behaviour of the fitting algorithm, the small area predictors and the estimators of the mean squared error. By using data of the Spanish surveys of income and living conditions of 2004–2008, an application to the estimation of 2008 average normalized net annual incomes in Spanish provinces by sex is given.
- Research Article
25
- 10.1093/biomet/ast030
- Jul 30, 2013
- Biometrika
Sinha & Rao (2009) proposed estimation procedures designed for small-area means, based on robustified maximum likelihood estimators and robust empirical best linear unbiased predictors. Their methods are of the plug-in type and may be biased. Bias-corrected estimators have been proposed by Chambers et al. (2013). Here, we investigate two new approaches: one relying on the work of Chambers (1986), and the second using the concept of conditional bias to measure the influence of units in the population. These two classes of estimators also include correction terms for the bias but are both fully bias-corrected, in the sense that the corrections account for the potential impact of the other domains on the small area of interest. Monte Carlo simulations suggest that the Sinha--Rao method and the bias-adjusted estimator of Chambers et al. (2013) may exhibit a large bias, while the new procedures often offer lower bias and mean squared error. A parametric bootstrap procedure is considered for constructing confidence intervals. Copyright 2013, Oxford University Press.
- Research Article
- 10.51387/24-nejsds69
- Oct 28, 2024
- The New England Journal of Statistics in Data Science
Growth curve analysis (GCA) has a wide range of applications in various fields where growth trajectories need to be modeled. Heteroscedasticity is often present in the error term, which can not be handled with sufficient flexibility by standard linear fixed or mixed-effects models. One situation that has been addressed is where the error variance is characterized by a linear predictor with certain covariates. A frequently encountered scenario in GCA, however, is one in which the variance is a smooth function of the mean with known shape restrictions. A naive application of standard linear mixed-effects models would underestimate the variance of the fixed effects estimators and, consequently, the uncertainty of the estimated growth curve. We propose to model the variance of the response variable as a shape-restricted (increasing/decreasing; convex/concave) function of the marginal or conditional mean using shape-restricted splines. A simple iteratively reweighted fitting algorithm that takes advantage of existing software for linear mixed-effects models is developed. For inference, a parametric bootstrap procedure is recommended. Our simulation study shows that the proposed method gives satisfactory inference with moderate sample sizes. The utility of the method is demonstrated using two real-world applications.
- Research Article
4
- 10.1002/cjs.11622
- Jun 3, 2021
- Canadian Journal of Statistics
In this article, we propose a conditional model estimator (cmmse) for the design‐based mean squared error (dMSE) of a small area mean estimator under the basic unit level model. The mean squared error dMSE refers to the variability of a small area estimator over all possible sample selections. It is different from the model mean squared error (mMSE), traditionally used to measure the efficiency in small area estimation problems. For known model parameters, Rao, Rubin‐Bleuer & Estevao [Rao et al., Survey Methodology 2018; 44, 151–166] showed that dMSE depends on two quadratic finite population parameters. A design estimator of dMSE, denoted as dmse, is obtained by substituting the quadratic parameters with their corresponding design unbiased estimators. Rao, Rubin‐Bleuer & Estevao [Rao et al., Survey Methodology 2018; 44, 151–166] proposed a composite MSE estimator (cmse) based on both the design and the model. This estimator is defined as a weighted average between the design‐based dmse and a model‐based estimator (mmse). Given known variance components, we obtain a new formula for dMSE that accounts for the estimation of the fixed model coefficients. Our conditional model MSE estimator cmmse is obtained by replacing the quadratic finite population parameters by their best predictions under the model, in the new formula of dMSE. Properties of the proposed estimator are studied in terms of design bias, relative root mean squared error, coverage rate and a score function of the confidence intervals.
- Research Article
- 10.9734/ajpas/2019/v4i130106
- Jun 19, 2019
- Asian Journal of Probability and Statistics
In recent years, the demand for small area statistics has greatly increased worldwide. A recent application of small area estimation (SAE) techniques is in estimating local level poverty measures in Third World countries which is necessary to achieve the Millennium Development Goals. The aim of this research is to study SAE procedures for estimating the mean income and poverty indicators for the Egyptian provinces. For this goal the direct estimators of mean income and (FGT) poverty indicators for all the Egyptian provinces are presented. Also this study applies the empirical best/Bayes (EB) and the pseudo empirical best/Bayes (PEB) methods based on the unit level - nested error - model to estimate mean income and (FGT) poverty indicators for the Egyptian border provinces with (2012-2013) income, expenditure and consumption survey (IECS) data. The (MSEs) and coefficient of variations (C.Vs) are calculated for comparative purposes. Finally the conclusions are introduced. The results show that EB estimators for poverty incidence and poverty gap are smaller than PEB for all selected provinces. EB figures indicate that the largest poverty incidence and gap are for the selected municipality at the scope of the border south west of Egypt (New Valley). The PEB figures indicate that the largest poverty incidence and gap are for the selected municipality at the scope of the border north east of Egypt (North Sinai). As expected, estimated C.Vs for EB of poverty incidence and poverty gap estimators are noticeably larger than those of PEB estimators in all selected provinces.
- Research Article
- 10.1093/jssam/smaf007
- Jun 27, 2025
- Journal of Survey Statistics and Methodology
In small area estimation, it is a smart strategy to rely on data measured over time. However, linear mixed models struggle to properly capture time dependencies when the number of lags is large. Given the lack of published studies addressing robust prediction in small areas using time-dependent data, this research seeks to extend M-quantile models to this field. Indeed, our methodology successfully addresses this challenge and offers flexibility to the widely imposed assumption of unit-level independence. Under the new model, robust bias-corrected predictors for small area linear indicators are derived. Additionally, the optimal selection of the robustness parameter for bias correction is explored, contributing theoretically to the field and enhancing outlier detection. For the estimation of the mean squared error (MSE), a first-order approximation and analytical estimators are obtained under general conditions. Several simulation experiments are conducted to assess the performance of the new predictors and MSE estimators, as well as the optimal selection of the robustness parameter. Finally, an application to the Spanish Living Conditions Survey data illustrates the usefulness of the proposed predictors.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.