Test-retest reliability-establishing that measurements remain consistent across multiple testing sessions-is critical to measuring, understanding, and predicting individual differences in infant language development. However, previous attempts to establish measurement reliability in infant speech perception tasks are limited, and reliability of frequently used infant measures is largely unknown. The current study investigated the test-retest reliability of infants' preference for infant-directed speech over adult-directed speech in a large sample (N=158) in the context of the ManyBabies1 collaborative research project. Labs were asked to bring in participating infants for a second appointment retesting infants on their preference for infant-directed speech. This approach allowed us to estimate test-retest reliability across three different methods used to investigate preferential listening in infancy: the head-turn preference procedure, central fixation, and eye-tracking. Overall, we found no consistent evidence of test-retest reliability in measures of infants' speech preference (overall r=0.09, 95% CI [-0.06,0.25]). While increasing the number of trials that infants needed to contribute for inclusion in the analysis revealed a numeric growth in test-retest reliability, it also considerably reduced the study's effective sample size. Therefore, future research on infant development should take into account that not all experimental measures may be appropriate for assessing individual differences between infants. RESEARCH HIGHLIGHTS: We assessed test-retest reliability of infants' preference for infant-directed over adult-directed speech in a large pre-registered sample (N=158). There was no consistent evidence of test-retest reliability in measures of infants' speech preference. Applying stricter criteria for the inclusion of participants may lead to higher test-retest reliability, but at the cost of substantial decreases in sample size. Developmental research relying on stable individual differences should consider the underlying reliability of its measures.
Read full abstract