PurposeWidespread application of next-generation sequencing, combined with data exchange platforms, has provided molecular diagnoses for countless families. To maximize diagnostic yield, we implemented an unbiased semi-automated genematching algorithm based on genotype and phenotype matching. MethodsRare homozygous variants identified in 2 or more affected individuals, but not in healthy individuals, were extracted from our local database of ∼12,000 exomes. Phenotype similarity scores (PSS), based on human phenotype ontology terms, were assigned to each pair of individuals matched at the genotype level using HPOsim. Results33,792 genotype-matched pairs were discovered, representing variants in 7567 unique genes. There was an enrichment of PSS ≥0.1 among pathogenic/likely pathogenic variant-level pairs (94.3% in pathogenic/likely pathogenic variant-level matches vs 34.75% in all matches). We highlighted founder or region-specific variants as an internal positive control and proceeded to identify candidate disease genes. Variant-level matches were particularly helpful in cases involving inframe indels and splice region variants beyond the canonical splice sites, which may otherwise have been disregarded, allowing for detection of candidate disease genes, such as KAT2A, RPAIN, and LAMP3. ConclusionSemi-automated genotype matching combined with PSS is a powerful tool to resolve variants of uncertain significance and to identify candidate disease genes.
Read full abstract