Intraclass correlation (ICC) is a reliability metric that gauges similarity when, for example, entities are measured under similar, or even the same, well-controlled conditions, which in MRI applications include runs/sessions, twins, parent/child, scanners, sites, and so on. The popular definitions and interpretations of ICC are usually framed statistically under the conventional ANOVA platform. Here, we provide a comprehensive overview of ICC analysis in its prior usage in neuroimaging, and we show that the standard ANOVA framework is often limited, rigid, and inflexible in modeling capabilities. These intrinsic limitations motivate several improvements. Specifically, we start with the conventional ICC model under the ANOVA platform, and extend it along two dimensions: first, fixing the failure in ICC estimation when negative values occur under degenerative circumstance, and second, incorporating precision information of effect estimates into the ICC model. These endeavors lead to four modeling strategies: linear mixed-effects (LME), regularized mixed-effects (RME), multilevel mixed-effects (MME), and regularized multilevel mixed-effects (RMME). Compared to ANOVA, each of these four models directly provides estimates for fixed effects and their statistical significances, in addition to the ICC estimate. These new modeling approaches can also accommodate missing data and fixed effects for confounding variables. More importantly, we show that the MME and RMME approaches offer more accurate characterization and decomposition among the variance components, leading to more robust ICC computation. Based on these theoretical considerations and model performance comparisons with a real experimental dataset, we offer the following general-purpose recommendations. First, ICC estimation through MME or RMME is preferable when precision information (i.e., weights that more accurately allocate the variances in the data) is available for the effect estimate; when precision information is unavailable, ICC estimation through LME or the RME is the preferred option. Second, even though the absolute agreement version, ICC(2,1), is presently more popular in the field, the consistency version, ICC(3,1), is a practical and informative choice for whole-brain ICC analysis that achieves a well-balanced compromise when all potential fixed effects are accounted for. Third, approaches for clear, meaningful, and useful result reporting in ICC analysis are discussed. All models, ICC formulations, and related statistical testing methods have been implemented in an open source program 3dICC, which is publicly available as part of the AFNI suite. Even though our work here focuses on the whole-brain level, the modeling strategy and recommendations can be equivalently applied to other situations such as voxel, region, and network levels.
Read full abstract