Abstract

Recent advances in social science surveys include collection of biological samples. Although biomarkers offer a large potential for social science and economic research, they impose a number of statistical challenges, often being distributed asymmetrically with heavy tails. Using data from the UK Household Panel Survey, we illustrate the comparative performance of a set of flexible parametric distributions, which allow for a wide range of skewness and kurtosis: the four‐parameter generalized beta of the second kind (GB2), the three‐parameter generalized gamma, and their three‐, two‐, or one‐parameter nested and limiting cases. Commonly used blood‐based biomarkers for inflammation, diabetes, cholesterol, and stress‐related hormones are modelled. Although some of the three‐parameter distributions nested within the GB2 outperform the latter for most of the biomarkers considered, the GB2 can be used as a guide for choosing among competing parametric distributions for biomarkers. Going “beyond the mean” to estimate tail probabilities, we find that GB2 performs fairly well with some disparities at the very high levels of glycated hemoglobin and fibrinogen. Commonly used linear models are shown to perform worse than almost all the flexible distributions.

Highlights

  • Recent developments in social surveys include the integration of biomarkers and selfreported health measures

  • The cholesterol ratio and Dehydroepiandrosterone sulfate (DHEAS) exhibits long right-hand tails and high kurtosis

  • We find no evidence in support of any of the special cases within the generalized gamma (GG) distribution

Read more

Summary

Introduction

Recent developments in social surveys include the integration of biomarkers and selfreported health measures. Biomarkers can contribute to our understanding of the underlying biological factors through which socioeconomic conditions get “under the skin” (for example, thought stress-related physiological responses) as well as the role of socioeconomic exposures at earlier pre-symptomatic health states (Davillas et al, 2016; Gruenewald et al, 2009; Jürges et al, 2013). Existing studies have applied OLS on raw or log transformed biomarkers (Gruenewald et al, 2009; Jürges et al, 2013) and alternative inherently nonlinear specifications, such as the generalized linear models (GLM) (Davillas et al, 2016). The GG and GB2 allow for a wide range of skewness and kurtosis to better accommodate the biomarker data generation processes; these models have been proposed for fitting heavily skewed outcomes (for example, health care costs; Jones et al, 2014), to which biomarkers share similar distributional features. Going “beyond the mean”, we explore the ability of these models to predict tail probabilities; prediction bias at the tails are of policy interest because of the elevated health risks and associated health-care costs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call