Intraclass correlation coefficient for grouped data.

Jelena Kovačić,Veda Marija Varnai

doi:10.1097/ede.0000000000000139

Abstract

To the Editors: In questionnaires applied in epidemiologic surveys, respondents often answer questions about continuous variables in terms of few predefined categories. Examples of such continuous variables commonly treated as categorized (grouped) data are yearly household income, frequency of food intake during 1 week, and hours of intensive physical activity per week. The reproducibility of grouped data is usually estimated by the intraclass correlation coefficient (ICC),1 calculated on the midpoints of predefined categories, or weighted kappa.2 Both methods, however, depend on the choice of categories (cut-off points, number) and seem to underestimate continuous data ICC.3–5 This behavior complicates their interpretation and hinders comparison of questionnaires with differently defined categories. To our knowledge, the maximum likelihood ICC has not been suggested in reproducibility studies with grouped data. We compare it with the midpoint ICC on simulated datasets and on a real-life example, food frequency questionnaire (FFQ) data. Our simulations mimicked situations when 1000 respondents answer the question with 5 predefined categories on 2 occasions. We performed 1000 simulations for each of the 99 ICC values (0.01–0.99). To investigate the influence of the number of categories, we further experimented with 3, 5, 10, 25, and 50 categories. These experiments included 1 low (0.2), 1 medium (0.5), and 1 high (0.8) ICC value (1000 simulations for each experiment). We separately analyzed cases of equal and unequal category widths. Data were simulated according to 1-way random-effects model and grouped into predefined categories afterwards. Detailed methodology of simulations and derivation of the log-likelihood for grouped data are in the eAppendix (https://links.lww.com/EDE/A807). The maximum likelihood estimator showed low bias, with a median value of 0.001 (range 0.000–0.011) when categories widths were equal, and 0.002 (0.000–0.017) when categories widths were unequal (simulations with 1000 respondents and 5 categories; eTable1, eFigure1, https://links.lww.com/EDE/A807). In contrast, midpoint ICC underestimated ICC by a median value of 0.067 (0.002–0.111) when categories widths were equal and 0.133 (0.002–0.182) otherwise. The Figure shows the results of simulations with different number of categories (as described in eTables 2–4, https://links.lww.com/EDE/A807). Maximum likelihood ICC was unaffected by the number of categories and the choice of cut-off points: its mean estimates for data with the same underlying ICC value differed by 0.005, at most. In comparison, midpoint estimates for data with the same ICC value differed by up to 0.24. The midpoint method underestimated ICC more when the number of categories was lower. Furthermore, its bias was higher when ICC was higher and widths of categories unequal. Its bias was low, mainly below 0.01, only when the number of categories was large (25 and higher).FIGURE: Impact of number of categories on estimators. For each of the 3 ICC values and a fixed number of categories, plot shows means of ICCMID and ICCMLE over 1000 simulations with equal category widths and 1000 simulations with unequal category widths. True ICC value is shown as a horizontal line at values of 0.2, 0.5, and 0.8. ICC, intraclass correlation coefficient; ICCMID, ICC calculated on categories’ midpoints; ICCMLE, maximum likelihood estimator.The FFQ example showed a similar pattern, with maximum-likelihood estimates on average higher by 0.09 than midpoint estimates (eTable5, https://links.lww.com/EDE/A807). Although the bias and dependence of midpoint ICC on the number of categories have been noted previously, maximum likelihood estimation has not been used in reproducibility studies with grouped data, probably because of lack of user-friendly software solutions. Thus, we provide R package iRepro (available from http://www.imi.hr/~jkovacic/irepro.html, including installation and usage guidelines). To conclude, researchers should be aware of bias related to the commonly used midpoint approach when estimating ICC from continuous grouped data. The maximum likelihood estimator is a better choice, as it showed almost no bias under all tested conditions. Furthermore, it was unaffected by the choice of categories. This enables comparison of questionnaires with different grouping schemes, including questionnaires with data that are not grouped (ie, continuous data). Unless the number of categories is large, such comparisons based on the midpoint method should be avoided. ACKNOWLEDGMENT We thank Jelena Macan for her valuable comments. Jelena Kovačić Veda Marija Varnai Institute for Medical Research and Occupational Health Zagreb, Croatia [email protected]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Intraclass correlation coefficient for grouped data.

Abstract

Talk to us

Similar Papers

More From: Epidemiology

Lead the way for us

Journal: Epidemiology	Publication Date: Sep 1, 2014
Citations: 7

Similar Papers

Reproducibility of manual and semi-automated late enhancement quantification in patients with Fabry disease
Wolfram Machann ... Meinrad Beer
Acta Radiologica | VOL. 55
Wolfram Machann, et. al.Wolfram Machann ... Meinrad Beer
01 Mar 2014
Acta Radiologica | VOL. 55

Reproducibility and repeatability of measuring the electrical impedance of the pregnant human cervix-the effect of probe size and applied pressure
Roobin P Jokhi ... Brian H Brown
BioMedical Engineering OnLine | VOL. 8
Roobin P Jokhi, et. al.Roobin P Jokhi ... Brian H Brown
17 Jun 2009
BioMedical Engineering OnLine | VOL. 8

The Reliability of the Associate Platinum Digital Foot Scanner in Measuring Previously Developed Footprint Characteristics: A Technical Note
M Owen Papuga ... Jeanmarie R Burke
Journal of Manipulative and Physiological Therapeutics | VOL. 34
M Owen Papuga, et. al.M Owen Papuga ... Jeanmarie R Burke
01 Feb 2011
Journal of Manipulative and Physiological Therapeutics | VOL. 34

Ultrasonic cleaning equipment
-
Metal Finishing | VOL. 93
--
01 Aug 1995
Metal Finishing | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intraclass correlation coefficient for grouped data.

Abstract

Talk to us

Similar Papers

More From: Epidemiology