Research and clinical practice rely heavily on caregiver-report measures, such as the Child Behavior Checklist 1.5-5 (CBCL/1.5-5), to gather information about early childhood behavior problems and to screen for child psychopathology. While studies have shown that demographic variables influence caregiver ratings of behavior problems, the extent to which the CBCL/1.5-5 functions equivalently at the item level across diverse samples is unknown. Item-level data of CBCL/1.5-5 from a large sample of young children (N=9087) were drawn from 26 cohorts in the Environmental influences on Child Health Outcomes program. Factor analyses and the alignment method were applied to examine measurement invariance (MI) and differential item functioning (DIF) across child (age, sex, bilingual status, and neurodevelopmental disorders), and caregiver (sex, education level, household income level, depression, and language version administered) characteristics. Child race was examined in sensitivity analyses. Items with the most impactful DIF across child and caregiver groupings were identified for Internalizing, Externalizing, and Total Problems. The robust item sets, excluding the high DIF items, showed good reliability and high correlation with the original Internalizing and Total Problems scales, with lower reliability for Externalizing. Language version of CBCL administration, education level and sex of the caregiver respondent showed the most significant impact on MI, followed by child age. Sensitivity analyses revealed that child race has a unique impact on DIF over and above socioeconomic status. The CBCL/1.5-5, a caregiver-report measure of early childhood behavior problems, showed bias across demographic groups. Robust item sets with less DIF can measure Internalizing and Total Problems equally as well as the full item sets, with slightly lower reliability for Externalizing, and can be crosswalked to the metric of the full item set, enabling calculation of normed T scores based on more robust item sets.