Abstract The emergence of the large-scale public genomic database facilitates genome-wide association study. An unprecedented number of samples emerged, which enable elucidate valid genetic inferences and contribute to finding novel associations for various phenotypes including cancer. However, most of the genome data is from European-based ancestry, which leads to the disparity in study for relatively minor ethnic groups including Asian. Meanwhile, among public genomic database, TOPMed consortium consists of ~180k participants, over 60% of the 180k samples are of predominantly non-European ancestry, and Asian ancestry consisted of 8 % in total. In addition, 1000 Genomes Project(1KG) contains 2,504 individuals from 26 populations and has sequenced 698 samples in addition. These projects serve as a reference to impute missing variants to improve association power for Genome-Wide Association Study (GWAS) subsequently. We compared the imputation results using those reference panels. Korean Genome Project (Korea1K), including 1094 whole genomes are public datasets consisting of only Korean (East Asian) origin. On GRCh38, Korea1K variants consisting of 59,463,566 variants in total, 99.6% with MAF <1% rare variant, only 35.7% of those rare variants passed from GATK Best Practice VQSR filter. With Korea1K data, we have downsampled the result into the exonic level, imputed with reference panels, and seen the consequences for which reference panel shows the best performances. Using 1KG as a reference, we could identify novel rare variants only in 0.1 to 0.38 fold of filtered rare variant in 1KG phase3 and 1KG phase3 30X respectively. On the other hand, TOPMed imputation resulted in rare variants over 1.65 fold increase. With well-imputed variants R2 over 0.8, all three reference database showed poor conservation which only 0.009 % of rare variant are rescued. The reference panel we commonly use are large but required to have more diversity in a Asian rare variant perspective. Despite increase of the Asian fraction in the public genomic database, our result suggests that the number of reference Asian samples are still low to perform association study. In conclusion, our finding highlights both the utility and limitation of the imputation reference panel for Asian ancestry. Novel variants filled by imputation are promising for adding power to various association studies. Citation Format: GangPyo Ryu, Youngil Koh, Sung-soo Yoon. Method to increase performance of genome-wide association study using imputation for Asian ancestry [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 6587.
Read full abstract