Monogenic cardiovascular diseases present throughout the human lifespan with varying symptoms. For most, our understanding of their penetrance, onset, and expressivity is derived from published case reports. The extent to which these literature-derived cases capture the full allelic and phenotypic spectrum is unknown, particularly since the literature is biased towards severe presentations. To illustrate, we investigated the penetrance (i.e. the presence of at least one symptom), onset, and phenotypic expressivity of a rare genetic disease linked to hemorrhagic stroke and other complications: COL4A1 -related vascular disease (COL4A1VD, or Gould Syndrome). We performed a systematic review of COL4A1VD cases reported in the biomedical literature ( N =459), normalizing their genetic variants and symptoms to controlled terminologies. We then investigated the phenotypic expression among COL4A1VD pathogenic variant carriers isolated from two population-scale biobanks (the UK Biobank and All of Us Research Program; combined N =714,246), identifying 178 subjects (prevalence: 0.025%) that should be at high risk for COL4A1VD-related symptoms. Based on the literature, the penetrance of COL4A1VD is approximately 92% (95% CI: 90%-95%), with each case expressing 4.3 symptoms on average. Neurologic complications were the most common finding, observed in 85% of symptomatic patients. Alternatively, the biobank-derived cohort displayed an overall enrichment for disease-related symptoms (COL4A1 pathogenic carriers are more likely to have increased symptom burden compared to non-carrier controls; P<0.005), but the penetrance of the variants was substantially reduced (34%; 95% CI: 26%-41%). In addition, this cohort expressed fewer symptoms on average (0.8 per carrier), with renal complications being far more common (78% of symptomatic biobank patients vs 22% of literature-derived cases; P < 0.001). Finally, the apparent age-of-onset was much later for the biobank-derived cohort (57-years vs 5-years for the literature cases). Most of these differences reflect the distinct ascertainment of these two COL4A1VD cohorts, but they also highlight our incomplete understanding of the true penetrance, expressivity, and onset for this rare disease. These results have important implications for COL4A1VD-related genetic counseling and surveillance, and they highlight a need for new methods that formally integrate the rare disease knowledge captured from these distinct data sources.
Read full abstract