Population allele frequency is crucially important for accurate interpretation of known and novel variants in medical genetics. Recently, several large allele frequency databases, such as the Genome Aggregation Database (gnomAD), have been created to serve as a global reference for such studies. However, frequencies of many rare alleles vary dramatically between populations, and population-specific allele frequency is often more informative than the global one. Many countries and regions, including Russia, remain poorly studied from the genetic perspective. Here, we report the first successful attempt to integrate genetic information between major medical genetic laboratories in Russia. We construct RUSeq, an open, large-scale reference set of genetic variants by analyzing 7452 exome samples collected in two major Russian cities-Moscow and St. Petersburg. An ∼10-fold increase in sample size compared to previous studies allowed us to characterize extensive genetic diversity within the admixed Russian population with contributions from several major ancestral groups. We highlight 51 known pathogenic variants that are overrepresented in Russia compared to other European countries. We also identify several dozen high-impact variants that are present in healthy donors despite being annotated as pathogenic in ClinVar and falling within genes associated with autosomal dominant disorders. The constructed database of genetic variant frequencies in Russia has been made available to the medical genetics community through a variant browser available at http://ruseq.ru.
Read full abstract