Genome-sequence-based newborn screening (gNBS) has substantial potential to improve outcomes in hundreds of severe childhood genetic disorders (SCGDs). However, a major impediment to gNBS is imprecision due to variants classified as pathogenic (P)or likely pathogenic (LP) that are not SCGD causal. gNBS with 53,855 P/LP variants, 342 genes, 412 SCGDs, and 1,603 therapies was positive in 74% of UK Biobank (UKB470K) adults, suggesting 97% false positives. We used the phenomenon of purifying hyperselection, which acts to decrease the frequency of SCGD causal diplotypes, to reduce false positives. Training of gene-disease-inheritance mode-diplotype tetrads in 618,290 control and affected subjects identified 293 variants or haplotypes and seven genes with variable inheritance contributing higher positive diplotype counts than consistent with purifying hyperselection and with little or no evidence of SCGD causality. With these changes, 2.0% of UKB470K adults were positive. In contrast, gNBS was positive in 7.2% of 3,118 critically ill children with suspected SCGDs and 7.9% of 705 infant deaths. When compared with rapid diagnostic genome sequencing (RDGS), gNBS had 99.1% recall. In eight true-positive children, gNBS was projected to decrease time to diagnosis by a median of 121days and avoid life-threatening disease presentations in four children, organ damage in six children, ∼$1.25 million in healthcare cost, and ten (1.4%) infant deaths. Federated training predicated on purifying hyperselection provides a general framework to attain high precision in population screening. Federated training across many biobanks and clinical trials can provide a privacy-preserving mechanism for qualification of gNBS in diverse genetic ancestries.
Read full abstract