Abstract
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.
Highlights
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce
76 million single nucleotide variants (SNVs) and indels were identified with their predicted consequences, including over 22 thousand potential loss of function variants annotated by LOFTEE3 (Supplementary Table 3)
Since WGS will become the standard genomic tool for research purposes and the future of precision medicine, providing a reference for admixed populations is critical. Genomic datasets such as Genome Aggregation Database (gnomAD) and TOPMed have recently included Latin American samples, but this is the first study to include more than 1000 high-coverage WGS in any Latin American census-based cohort
Summary
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. We present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS. Even studies on late-onset diseases can be powered by a control group of verified unaffected status when aged older than the average age at onset This rationale was previously explored by us using whole-exome sequencing of elderly Brazilians[18], and by others using a European-descent whole-genome dataset of Australian elderly[19]. We provide variants and respective allelic frequencies in a public resource, ABraOM (http://abraom.ib.usp.br)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.