BackgroundVery large sample sizes are often needed to capture heterogeneity in autism, necessitating data sharing across multiple studies with diverse assessment instruments. In these cases, data harmonization can be a critical tool for deriving a single dataset for analysis. This can be done through computational approaches that enable the conversion of scores across various instruments. To this end, our study examined the use of analytical approaches for mapping scores on two measures of adaptive functioning, namely predicting the scores on the vineland adaptive behavior scales II (VABS) from the scores on the adaptive behavior assessment system II (ABAS).MethodsData from the province of Ontario neurodevelopmental disorders network were used. The dataset included scores VABS and the ABAS for 720 participants (autism n = 547, 433 male, age: 11.31 ± 3.63 years; neurotypical n = 173, 95 male, age: 12.53 ± 4.05 years). Six regression approaches (ordinary least squares (OLS) linear regression, ridge regression, ElasticNet, LASSO, AdaBoost, random forest) were used to predict VABS total scores from the ABAS scores, demographic variables (age, sex), and phenotypic measures (diagnosis; core and co-occurring features; IQ; internalizing and externalizing symptoms).ResultsThe VABS scores were significantly higher than the ABAS scores in the autism group, but not the neurotypical group (median difference: 8, 95% CI = (7,9)). The difference was negatively associated with age (beta = -1.2 ± 0.12, t = -10.6, p < 0.0001). All estimators demonstrated similar performance, with no statistically significant differences in mean absolute error (MAE) values across estimators (MAE range: 4.96–6.91). The highest contributing features to the prediction model were ABAS composite score, diagnosis, and age.LimitationsThis study has several strengths, including the large sample. We did not examine the conversion of domain scores across the two measures of adaptive functioning and suggest this as a future area of investigation.ConclusionOverall, our results supported the feasibility of harmonization. Our results suggest that a linear regression model trained on the ABAS composite score, the ABAS raw domain scores, and age, sex, and diagnosis would provide an acceptable trade-off between accuracy, parsimony, and data collection and processing complexity.
Read full abstract