BackgroundLinkage of longitudinal administrative data for mothers and babies allows assessment of maternal factors that affect child health outcomes. Linkage between maternity and children's datasets in England is being developed by the Health and Social Care Information Centre and will add to evidence from similar linked data in smaller populations including Scotland, UK, and Ontario, Canada, but will only include prospective data. We established the feasibility of creating a retrospective, linked birth-cohort using non-disclosive clinical variables in administrative data from English hospitals. MethodsRecords of babies and mothers admitted to English National Health Service hospitals from April 1, 2012, to March 31, 2013, were extracted from Hospital Episode Statistics. Baby and maternal records were linked using delivery information shared in both records, and the success of deterministic and probabilistic linkage methods was compared. Erroneous values of birth weight and gestation were identified with gestation-specific reference values. Representativeness of the linked birth cohort was assessed by comparison of key characteristics with national published data (Office for National Statistics). χ2 tests were used to compare proportions of records in each group. Linkage was performed with code written in Stata (version 13); computational effort was insubstantial for the number of records involved. FindingsOf the 672 644 baby records extracted, 280 470 (42%) were linked deterministically to a maternal record when indirect identifiers were used: hospital, general practice, maternal age, birthweight, gestation, birth order, and sex. After probabilistic methods incorporating additional variables that could differ between mother and baby records (admission dates, ethnicity, three or four-character postcode district) or included missing values (delivery data), the final linked cohort captured about 97% of the 678 712 births in English hospitals during 2012. After data cleaning, no significant differences were seen between distributions of sex, gestational age, birthweight, and maternal age according to published national data. InterpretationProbabilistic linkage of indirect identifiers allowed information on maternal and baby health-care trajectories to be combined on a national level without increasing disclosure. These linked data have the potential to provide insight into how prenatal and postnatal maternal health affects child health outcomes. The large sample size and representative population offer power to investigate important subgroups such as preterm birth. FundingThis work was supported by funding from the Wellcome Trust (grant number WT103975/Z/14/).