• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    • Journal finder

      AI-powered journal recommender

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link

Related Topics

  • Imputation Accuracy
  • Imputation Accuracy
  • High-density Genotyping
  • High-density Genotyping
  • Genotype Data
  • Genotype Data
  • Genotyping Array
  • Genotyping Array

Articles published on Genotype imputation

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
976 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.3390/ani16010117
Genome-Wide Association Study Identifies Candidate Genes for Body Size Traits in Wanyue Black Pigs
  • Dec 31, 2025
  • Animals : an Open Access Journal from MDPI
  • Haibo Ye + 9 more

This study aimed to elucidate the genetic basis of body-size-related traits in Wanyue Black pigs, including body length, chest circumference, forearm circumference, and hip circumference. Phenotypic data were collected from 139 four-month-old female pigs, and genotyping was performed using a 50K SNP array. After stringent quality control and genotype imputation, approximately 4,697,453 high-quality autosomal variants were retained for subsequent genome-wide association study (GWAS), transcriptome-wide association study (TWAS), and phenome-wide association study (PheWAS). The GWAS identified four genome-wide significant loci, including rs343276492 and rs321308815. TWAS results revealed that the expression level of PTH2R was significantly associated with pituitary-related traits. In addition, selection signal analysis identified multiple genomic regions related to growth, development, and environmental adaptability, which were significantly enriched in pathways such as circadian rhythm regulation and the MAPK signaling pathway. These pathways play critical roles in growth regulation and adaptive evolution in Wanyue Black pigs. Collectively, this study provides valuable candidate genes and potential molecular markers for the genetic improvement and breeding of Wanyue Black pigs.

  • New
  • Research Article
  • 10.1186/s13059-025-03912-z
Comparative assessment of SNP genotyping assays for challenging forensic samples utilizing ancient DNA methods.
  • Dec 23, 2025
  • Genome biology
  • Adam Staadig + 7 more

The fields of ancient DNA research and forensic genetics share both methodological similarities and common challenges, particularly in the analysis of degraded DNA. Leveraging these overlaps, this study evaluates three single nucleotide polymorphisms (SNP)-based genotyping assays for analyzing challenging forensic samples: the FORCE-QIAseq SNP panel, the Twist ancient DNA hybridization capture panel, and whole-genome sequencing. We analyze twenty skeletal bone and tooth samples from authentic missing person cases, where almost all samples are severely degraded and contain exceptionally low amounts of endogenous DNA, reflected by both reduced quantifiable DNA concentrations and lower proportions of human DNA reads than typically obtained from high-quality forensic samples. Despite these challenging sample characteristics, both the FORCE and Twist assays successfully generate a substantial number of genotypes across many samples, while whole-genome sequencing yields fewer SNP calls. However, techniques like probabilistic genotyping, increase sequencing depth or genotype imputation can further enhance the utility of WGS for forensic use. This study highlights the effectiveness of incorporating ancient DNA methods into forensic genetics for the analysis of degraded samples. The findings are broadly applicable to both forensic and ancient DNA research disciplines, offering valuable insights into assay selection based on sample condition and investigative goals.

  • Research Article
  • 10.1002/cpt.70171
Improving Genotype Imputation in High-Dimensional Pharmacogenomics Using Multiple Imputation: Evaluation with Machine Learning Approaches.
  • Dec 17, 2025
  • Clinical pharmacology and therapeutics
  • Innocent G Asiimwe + 6 more

Multiple imputation is well-established for handling missing data, yet its use in high-dimensional genetic datasets remains limited. Using pharmacokinetic tuberculosis simulations and SNP data (1000 Genomes Project), we compared machine learning (ML) and traditional approaches (e.g., mean imputation and complete-case analysis) for imputation and covariate selection. We developed a multiple imputation framework incorporating genotype probabilities, imputation uncertainty (INFO score), and missingness percentages. Dimensionality reduction enabled scalable random forest and penalized regression for covariate selection. In simulations, only multiple imputation achieved adequate coverage (percentage of 95% confidence intervals containing the true value) exceeding a 90% nominal threshold. For example, on the imputation server, coverage improved from 0% with single imputation to up to 94% under 10% missingness. Applied to clinical warfarin datasets (War-PATH, n = 548; IWPC, n = 316) and the UK Biobank (n = 500, 1000), multiple imputation recovered known pharmacogenomic associations (CYP2C9*8/*9/*11; VKORC1 -1639G>A), reduced false-positives, and detected signals missed by single imputation (e.g., genome-wide significant rs4697699, SLC2A9 locus). Computational costs were modest, adding only ~1.25 minutes for 10 imputations to the 22.7 minutes required by single imputation on the Michigan Imputation Server. For SNP selection, penalized regression performed best in the high-effect scenario (F1 = 0.897 ± 0.091), while GWAS followed by random forest performed best in the low-effect scenario (F1 = 0.657 ± 0.110). These findings show that multiple imputation improves reliability and discovery in high-dimensional pharmacogenomics, with ML offering promising but inconsistent benefits during SNP selection. However, generalizability beyond the studied datasets and computational scalability to larger biobank-scale analyses remain important limitations that warrant further investigation.

  • Research Article
  • 10.21203/rs.3.rs-8264218/v1
An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses
  • Dec 17, 2025
  • Research Square
  • Zhihui Zhang + 3 more

Genotype imputation is a cornerstone of modern genetic studies, enhancing the resolution of genome-wide association studies (GWAS), fine mapping, and polygenic risk score estimation by inferring untyped variants using reference panels. The output of imputation is a set of probabilistic genotypes, each associated with an inherent degree of uncertainty. However, conventional downstream analyses often overlook this uncertainty, relying instead on allelic dosages—expected allele counts computed from probabilistic genotypes—as proxies. This practice can be misleading, as distinct genotype probability distributions may produce identical dosages despite vastly different confidence levels, potentially introducing bias and inflating false discoveries. To address this limitation, we introduce an entropy-weighted association method that explicitly quantifies imputation uncertainty using Shannon entropy. These entropy values are integrated as observation-level weights within the association model, allowing the method to dynamically account for the reliability of each imputed genotype. Through simulation studies, we demonstrate that this approach substantially reduces false positives, especially when genotypic uncertainty is pronounced. Our findings highlight the importance of modeling imputation uncertainty and offer a framework that improves the robustness of GWAS and other genotype imputation-dependent analyses.

  • Research Article
  • 10.3791/68879
Comprehensive Evaluation of Genotype Imputation Tools for Ultra-low-depth Whole-Genome Sequencing Data.
  • Dec 12, 2025
  • Journal of visualized experiments : JoVE
  • Ying Lin + 5 more

Ultra-low-depth sequencing (ULDS) is a cost-effective strategy for large-scale genomic studies, but its utility hinges on accurate genotype imputation. This study evaluates three imputation tools -- STITCH, QUILT2, and GLIMPSE2 -- across varying sequencing depths and sample sizes, using the China Kadoorie Biobank (CKB) and The 1000 Genomes Project (1KGP) East Asian (EAS) reference panels. Critical performance divergences are demonstrated: Sample size sensitivity: STITCH's accuracy improved markedly with larger samples, whereas QUILT2 and GLIMPSE2 showed minimal dependence on sample size. Reference panel optimization: Population-specific CKB significantly enhanced accuracy for QUILT2 and GLIMPSE2 but had a negligible impact on STITCH, which relies on internal haplotype inference. Depth thresholds: All tools achieved robust accuracy at moderate sequencing depths (≥ 0.5x), but STITCH underperformed drastically at ultra-low depths (≤ 0.1x). GLIMPSE2 with CKB delivered the highest overall accuracy, while QUILT2 balanced precision and computational efficiency. For non-invasive prenatal testing (NIPT) data, GLIMPSE2+CKB maintained sufficient accuracy for downstream analyses. A decision framework is proposed, prioritizing population-matched panels and depth-adapted tools, offering actionable guidelines for optimizing ULDS-WGS in diverse research settings. These insights bridge methodological advancements with practical implementation, enabling cost-effective scaling of genomic studies without compromising data quality.

  • Research Article
  • 10.3389/fgene.2025.1692544
Development and application of an updated haplotype reference panel for association analysis of spontaneous sex reversal in XX rainbow trout
  • Dec 10, 2025
  • Frontiers in Genetics
  • Sixin Liu + 5 more

With the rapid cost reduction of next-generation sequencing, low-coverage whole-genome sequencing (lcWGS) followed by genotype imputation is becoming a cost-effective alternative to SNP (single nucleotide polymorphism) array genotyping. Previously, we constructed a reference panel consisting of 410 samples representing five breeding populations of rainbow trout (Oncorhynchus mykiss). However, the reference panel had a limited representation of the major commercial populations in the U.S. The objectives of this study were two-fold: 1) to update the haplotype reference panel of rainbow trout by adding more reference populations and more samples from the previous reference populations; and 2) to identify SNPs associated with spontaneous sex reversal to males in XX rainbow trout (sXX sex reversal). To update the reference panel, high-coverage whole-genome sequences were obtained from 129 additional fish from several populations. To identify SNPs associated with sXX sex reversal, samples from two families were genotyped with both the Axiom 57K SNP array and lcWGS. The updated reference panel outperformed the previous panel with an increase in accuracy of genotype imputation and a reduction in low-confidence genotypes. Based on the array genotypes, 55 significant SNPs associated with sXX sex reversal were identified and 53 out of the 55 SNPs were located on chromosome OmyA26. Based on the imputed genotypes, 743 SNPs on chromosome OmyA26 and 7 SNPs on chromosome OmyA19 were associated with sXX sex reversal. Two of those OmyA26 significant SNPs were identified by both genotyping methods. In conclusion, the updated haplotype reference panel improved the accuracy of genotype imputation from lcWGS, and enabled identification of additional SNPs associated with sXX sex reversal in rainbow trout.

  • Research Article
  • 10.1002/gepi.70021
Adjustment for Genotype Imputation Uncertainty Corrects for Inflated Type I Error in Family-Based Association Testing.
  • Dec 1, 2025
  • Genetic epidemiology
  • Tyler R C Day + 9 more

Genotype imputation is a widely-used data augmentation approach that is applied to samples of related and/or unrelated individuals. Association testing may then be carried out on the complete data with commonly-used methods. This approach has typically not accounted for the mix of observed and imputed data, although recent work has noted the potential for introduction of confounding in case-control studies. In the Alzheimer's Disease Sequencing Project family sample we found severe inflation of the test statistics in logistic regression analysis following genotype imputation, even after standard covariate adjustments. Here we dissect sources of this inflation, which is driven by three factors: frequency-dependent bias in imputation-induced allele frequencies, differential measurement error, and differential genotyping rates in cases versus controls that introduces confounding. To address the problem, we propose a statistic, imputation deviance ( ), which can be easily computed from the observed and imputed genotype probabilities. We show that , as an additional fixed-effect covariate, controls the genome-wide inflation in analysis of this family-based sample, and we speculate that use of imputation deviance may also provide a practical approach to correct for genotype imputation effects in other settings, particularly when a data set is unbalanced and includes related individuals.

  • Abstract
  • 10.1002/alz70855_106790
Improving Genotype Imputation of African‐derived Genetic Variants in Studies of Alzheimer's Disease
  • Dec 1, 2025
  • Alzheimer's & Dementia
  • Nicholas R Wheeler + 42 more

BackgroundThe DAWN Alzheimer's Research Study is a multi‐site international project to recruit African‐American, Hispanic/Latino, and African participants for genomic studies of Alzheimer's Disease (AD). In addition to clinical evaluations, cognitive assessments and biomarker data collection, array‐based representative genotyping is being performed for all participants. To increase the value of these genotypic data a vastly richer dataset can be created using imputation, a process that requires a whole genome sequenced reference dataset. High quality imputation depends on having large reference datasets representative of the ancestries of the target dataset. Using inadequate reference datasets results in low imputation quality, fewer usable imputed variants and hinders downstream analysis. Given the inclusion of African‐ancestry participants (whose reference datasets are small) in the DAWN study, we examined the impact of using different strategies on the accuracy of genotype imputation.MethodUsing DAWN study data generated by the Illumina Global Screening Array, we performed genotype imputation using the TopMED R3 dataset and compared these results to a meta‐imputation workflow using TopMED R3 supplemented by the Africa 6K dataset. This comparison explicitly tests the impact of increasing African ancestry in the imputation reference panel. Imputation results were assessed for chromosomes 1, 10, and 20 for total count of imputed variants, and by comparing variant counts across a range of imputation quality (R2) and variant rarity (MAF) filter criteria to identify apparent trends.ResultAn additional 190,784 (0.3%) variants are captured from the meta‐imputed (64,370,296) vs the single‐imputed (64,179,512) dataset. Variant quality also improves, with an increase of ∼80,000 (5%) filter‐passing variants (R2 > 0.8) in the meta‐imputation compared to the TopMED‐only imputation results (Figure 1).ConclusionThe use of meta‐imputation to better match the genetic background of the DAWN dataset through the use of multiple imputation references significantly increases the density and quality of the resulting genotypic dataset, enabling more powerful studies of AD genetics. This demonstrates the utility of meta‐imputation for better matching the genetic background of samples when performing imputation.

  • Abstract
  • 10.1002/alz70861_108684
Genome‐wide association study of ADAS scores identifies novel loci linked to cognitive function in Alzheimer’s disease
  • Dec 1, 2025
  • Alzheimer's & Dementia
  • Mengyuan Kan + 6 more

BackgroundAlzheimer’s disease (AD) is characterized by progressive cognitive decline and genetic influences on cognition are well‐established. Investigating single nucleotide polymorphisms (SNPs) associated with cognitive function across dementia progression stages may identify early genetic markers of cognitive decline. The Alzheimer's Disease Neuroimaging Initiative (ADNI), which categorized participants as cognitively normal, mild cognitive impairment, or dementia, provide a valuable resource for such studies. We therefore conducted genome‐wide association study (GWAS) on ADAS scores in ADNI participants, to identify SNPs associated with cognitive performance, serving as early markers of dementia progression.MethodGenotyped data from three ADNI phases were downloaded and merged. Quality control retained common SNPs with a variant call rate >0.98, sample call rate >0.95, Hardy‐Weinberg Equilibrium p ‐value >10‐6, and minor allele frequency (MAF) ≥0.01. Individuals were included if their genetic sex matched reported sex and they had no up‐to‐third‐degree relationships with other participants. Genotype imputation was performed using the Michigan Imputation Server. Functional annotation was performed using ANNOVAR. Separate GWAS of ADAS11 and ADAS13 were performed using linear regression in PLINK2, adjusting for age, sex, years of education, and the first ten principal components. Genetic loci were determined based on index SNPs (p ‐value <10‐6) with more than one nominally associated SNPs (p ‐value <10‐4) in linkage disequilibrium (r2 ≥0.02) within 250kb using PLINK clump function.ResultA total of 1,236 participants and 8,416,387 common autosomal SNPs were included in the analyses. Thirteen genetic loci were identified as associated with ADAS11, and six with ADAS13, five of which were shared (Figure 1A and 1B). Four index SNPs reached the genome‐wide significant threshold (p ‐value <5×10‐8), including the strongest signal at the APOE locus, primarily driven by dementia, and three novel but less frequent SNPs (MAF <0.05) located on chromosomes 3, 8 and 13. The chromosome 3 locus is near the SUCLG2 gene, previously reported having an SNP linked to cerebrospinal fluid Aβ1–42 levels in AD patients.ConclusionNovel genetic loci identified in ADNI provide insights into the genetic basis of cognitive performance, warranting further research on whether well‐established A/T/N imaging or fluid biomarkers mediate their effects on cognitive and diagnostic outcomes.

  • Research Article
  • 10.1002/alz70855_107394
Basic Science and Pathogenesis.
  • Dec 1, 2025
  • Alzheimer's & dementia : the journal of the Alzheimer's Association
  • Sarwan Ali + 2 more

Imputation is still a crucial technique in genomic studies to infer untyped variants, enhancing the coverage and power of genome-wide association studies. However, its accuracy can vary, especially for rare variants and across populations. Using genotype data from ADSP (Alzheimer's Disease sequencing Project), we conducted I) traditional imputation with a single round of genotype imputation ("SI") and II) two (or more) rounds of imputation ("DI") on data progressively passed through quality control and again imputed. We tested the performance of either approach by estimating the amount of imputation errors using whole genome sequencing data (WGS) from ADSP as gold standard. For 196 Caribbean Hispanics, we estimated the error rates only in SNPs within chromosome 1 imputed at optimal quality (R^2≥ 80%) and across minor allele frequency (MAF) brackets: common (MAF ≥ 0.05), uncommon (0.01 ≤ MAF < 0.05), rare (0.001 ≤ MAF < 0.01), ultra-rare (MAF < 0.001) and overall. We tested the entire sample and then separately for individuals with ≥50% African genetic ancestry (AFR). Overall, DI showed significant lower error rates compared to SI (2.65% vs. 4.23%, Wilcoxon p-value < 0.001). This result was consistent across MAF (Table 1), in rare (2.99% vs. 3.37%), uncommon (2.93% vs. 3.50%), and ultra-rare variants (3.08% vs. 3.43%). In individuals with predominant AFR, imputation errors were more frequently observed compared to those with low AFR, especially for rare and ultra-rare variants; nevertheless, DI maintained significant lower error rates across all MAF categories (Table 2). Our findings demonstrate that multiple rounds of imputation generally outperform the traditional single one in terms of accuracy, particularly for rare and ultra-rare variants. This improvement is confirmed in groups that have shown higher error rates after traditional imputation, such as individuals with predominant African ancestry (Sariya et al. 2019). These results highlight a valuable and easy approach for enhancing the quality of imputed data across populations, which could lead to more robust genetic association studies.

  • Research Article
  • 10.3168/jds.2025-26715
Genotype imputation accuracy of X chromosome variants in Holstein cattle based on different software and imputation strategies.
  • Dec 1, 2025
  • Journal of dairy science
  • Tatiana C De Souza + 10 more

Genotype imputation accuracy of X chromosome variants in Holstein cattle based on different software and imputation strategies.

  • Research Article
  • 10.1016/j.aqrep.2025.103088
Optimizing genotype imputation pipeline for low-coverage whole genome sequencing data in spotted sea bass and its application in genomic prediction
  • Dec 1, 2025
  • Aquaculture Reports
  • Chong Zhang + 11 more

Optimizing genotype imputation pipeline for low-coverage whole genome sequencing data in spotted sea bass and its application in genomic prediction

  • Research Article
  • 10.1016/j.xgen.2025.101072
Leveraging ancestral recombination graphs for scalable mixed-model analysis of complex traits.
  • Dec 1, 2025
  • Cell genomics
  • Jiazheng Zhu + 7 more

Leveraging ancestral recombination graphs for scalable mixed-model analysis of complex traits.

  • Abstract
  • 10.1002/alz70855_107089
Genome‐Wide Analyses for cognitive performances in Multi‐Ethnic Cohorts identifies novel rare locus PCAT5
  • Dec 1, 2025
  • Alzheimer's & Dementia
  • Neetesh Pandey + 5 more

BackgroundEmploying a joint test of objectively‐measured endophenotypes vs. traditional binary outcomes (i.e., affected vs. non‐affected) can enhance power to discover novel genetic loci for Alzheimer's disease and related dementia (ADRD). We tested the association between rare variants and scores across multiple cognitive domains using data from the Mexican Health and Aging Study (MHAS) and conducted replication analyses in two multi‐ethnic cohorts: Washington Heights Inwood Aging Project (WHICAP) and Multi‐Ethnic Study of Atherosclerosis (MESA).MethodGenome‐wide gene‐based test employing rare variants from whole‐genome sequencing (WGS) was carried out using MultiSKAT package in 1,930 Mexicans. Replication used imputed genotype data from 909 WHICAP participants and 4,276 MESA participants. Variants were filtered on allele frequency (<1%) and, if imputed, >=80%. We adjusted for sex, age, education, and APOE.ResultIn MHAS, we identified a genome‐wide significant locus, PCAT5, before and after APOE adjustment (p = 9.89E‐07; p = 9.74E‐07, respectively). We replicated the signal in both WHICAP (p = 0.018, p = 0.019, before/after APOE adjustment, respectively) and MESA (p = 4.07E‐10; p = 7.12E‐10 before/after APOE adjustment, respectively). Within the latter, when stratified by ethnicity, African Americans (n = 1,018) showed a nominal significant association before/after APOE adjustment (p = 0.0254; p = 0.0248, respectively).ConclusionThis study provides strong evidence for the association of rare variants within PCAT5 and cognitive performances across different populations. Importantly, a rare variant in PCAT5 was previously reported in a large meta‐analysis for ADRD of 65,602 Non‐Hispanic Whites (Naj, 2021). These findings suggest that PCAT5 may play a role in ADRD pathology independently of APOE, warranting further biological investigations to understand the mechanisms underlining this gene's role.

  • Research Article
  • 10.1186/s12864-025-12256-8
Comparison between SNP array and imputed data to estimate population structure and ROH hotspots in horse breeds
  • Nov 29, 2025
  • BMC Genomics
  • Giorgio Chessari + 8 more

BackgroundSingle nucleotide polymorphism (SNP) arrays are commonly used for studying the genomic structure and diversity of livestock breeds, but whole-genome sequencing (WGS) provides higher-resolution genomic data. Genotype imputation has become a standard practice for increasing the genomic resolution of association studies. This work aimed to extend imputation to biodiversity analyses, comparing SNP array data before and after imputation. A 40 k SNP dataset of 281 horses from 12 breeds (DSSNP) was imputed to sequence-level using a reference panel of 327 sequenced individuals, generating approximately 9 million markers after filtering (DSIMP). Both datasets were used to study genetic variability, population structure and runs of homozygosity (ROH).ResultsGenetic indices and relationships showed similar trends for both datasets, with high Pearson correlations and Mantel test values (> 0.8) indicating that the imputed data are a reliable alternative to SNP array data for genetic studies. Multidimensional scaling and admixture analyses highlighted how the genetic proximity between breeds observed for the DSSNP was amplified by the imputation process in cases of those breeds with a few sequences included in the WGS reference panel. ROH investigation showed overlapping homozygosity regions between the two datasets, highlighting the benefits of having more markers for gene and QTL annotation. Of the 141 ROH islands identified in the DSSNP, 79 overlapped perfectly with those found in the imputed data. Validation with the reference panel of 327 sequenced horses revealed a single ROH island on ECA11 shared across all three datasets, containing genes associated with morphology and behavioral traits.ConclusionsHigh correlations between SNP array and imputed data indicate that imputed genotypes provide a reliable alternative for assessing population structure and genetic diversity in horse breeds. Specifically, imputation can enhance the detection of ROH and the annotation of genes within ROH islands, with the reliability of these results depending on the quality of the reference panel and its representation of the studied breeds, among others.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12864-025-12256-8.

  • Research Article
  • 10.1038/s42003-025-09215-0
Studying rare variant polygenic risk scores using whole exome sequencing and imputed genotype data
  • Nov 24, 2025
  • Communications Biology
  • Ji-One Kang + 5 more

Rare variant polygenic scores (rvPRS) are developed to improve phenotype prediction, yet a standardized construction protocol remains unavailable. We aim to establish an optimal rvPRS protocol using whole exome sequencing (WES) and imputed genotype (IMP) data from 502,369 UK Biobank participants and to evaluate its predictive performance compared to common variant PRS (cvPRS). rvPRS models are constructed for 13 binary and 5 quantitative traits using gene-burden and single-SNP associations and are assessed via R2, perSD OR/Beta, NRI, and IDI. Single-SNP-based rvPRS outperform gene-burden models, and IMP-derived rvPRS generally surpass WES-derived models. For 6 of 12 validated traits, combined tPRS (cvPRS + rvPRS) improves prediction over cvPRS alone. IMP data also show a stronger correlation between heritability and rvPRS association strength. This study provides a practical rvPRS protocol applicable across traits and underscores the potential of rare variants to enhance phenotype prediction.

  • Research Article
  • 10.1073/pnas.2416980122
Imputation of ancient canid genomes reveals inbreeding history over the past 10,000 years
  • Nov 24, 2025
  • Proceedings of the National Academy of Sciences
  • Katia Bougiouri + 17 more

The multi-millennia-long history between dogs and humans has placed them at the forefront of archaeological and genomic research. Despite ongoing efforts including the analysis of ancient dog and wolf genomes, many questions remain regarding the evolutionary processes that led to the diversity of breeds today. Although ancient genome sequences provide valuable information about these processes, their utility is hindered by low depths of coverage and postmortem damage, which inhibits confident genotype calling. In the present study, we assess how genotype imputation of ancient dog and wolf genomes, using a large reference panel, can increase the amount of information provided by ancient datasets. We evaluated imputation accuracy by down-sampling high-coverage dog and wolf genomes to 0.05 to 2× coverage and compared concordance between imputed and high-coverage genotypes. We measured the impact of imputation on principal component analyses and runs of homozygosity (ROH). Our findings show high (R2 > 0.9) imputation accuracy for dogs with coverage as low as 0.5× and for wolves as low as 1.0×. We then imputed a dataset of 90 ancient dog and wolf genomes to assess changes in inbreeding during the last 10,000 y of dog evolution. Ancient dog and wolf populations generally exhibit lower inbreeding levels than present-day individuals. Regions with low ROH density maintained across ancient and present-day dogs were significantly associated with genes related to immunity and chemosensory receptors. Our study indicates that imputing ancient canine genomes is a viable strategy that allows for the use of analytical methods previously limited to high-quality genetic data.

  • Research Article
  • 10.1038/s42003-025-09214-1
Imputation disparities driven by recent selection and their impact on disease risk estimation in East and Southeast Asian populations.
  • Nov 21, 2025
  • Communications biology
  • Dingyang Li + 28 more

Accurate genotype imputation is essential for large-scale genetic studies and precision medicine. While East Asian (EAS)-specific reference panels like ChinaMAP and CHN100k have been developed, most studies still rely on multi-ancestry panels like TOPMed due to the large sample size. However, their performance in underrepresented groups like Southeast Asians remains unclear. Using high-coverage whole-genome sequencing and SNP-array data from 8,316 Chinese and Thai individuals, we systematically evaluate six state-of-the-art reference panels for genotype imputation. Our results show that EAS-specific panels outperformed multi-ancestry panels for East and Southeast Asian populations. For example, ChinaMAP achieves a mean heterozygosity concordance rate above 0.90 without R2 filtering, whereas TOPMed requires an R2 threshold of 0.60-0.70 to achieve comparable results. Notably, we find that recent positive selection drives regional disparities in imputation accuracy, as illustrated by the olfactory receptor gene cluster. More importantly, our results indicate that the choice of reference panel and R2 thresholds have a significant impact on polygenic risk score estimation for disease prediction. These findings provide valuable guidelines for improving genotype imputation in East and Southeast Asian populations and underscore the need for ancestrally diverse reference panels to support globally equitable genomic research.

  • Research Article
  • Cite Count Icon 1
  • 10.1038/s42003-025-09052-1
An updated Pig Haplotype Reference Panel (PHARP 4.0) comprising 13,298 haplotypes
  • Nov 20, 2025
  • Communications Biology
  • Qingyu Wang + 16 more

High-throughput genome sequencing and genotyping have significantly accelerated genetic research. However, the high cost of whole-genome sequencing (WGS) remains a barrier to large-scale studies like genome-wide association studies (GWAS) and genomic prediction. Genotype imputation offers a cost-effective alternative by inferring unobserved variants from lower-density data using haplotype reference panels. In this study, we present the updated Pig Haplotype Reference Panel (PHARP) 4.0, comprising 6449 pig genomes from 154 breeds. PHARP 4.0 encompasses 50.3 million SNPs and 5.8 million indels, making it the largest and most diverse pig reference panel to date. PHARP 4.0 demonstrated superior imputation accuracy compared to existing panels (SWIM, AHC, AGIDB, and PGRP), achieving concordance rates (CR > 0.99) and correlation coefficients (R² > 0.98) in European breeds and improved accuracy in Chinese Jinhua pigs (CR = 0.936, R² = 0.924) when imputing from 80 K SNP chip data to whole-genome sequencing (WGS). We further optimized an RNA-seq-based imputation pipeline by incorporating multiple breeds and applying a 6× sequencing depth filter, achieving CR > 0.95 and R² > 0.90 in European breeds, and a CR of 0.93 with an R² = 0.92 in Chinese Jinhua pigs. Additionally, increasing the specific reference panel size to approximately 400 samples improved the imputation of rare variants. Utilizing PHARP 4.0, we successfully imputed low-density SNP chip data for two GWAS, identifying significant SNPs likely representing causal variants. Overall, PHARP 4.0 serves as a valuable resource for advancing pig genetic research and supporting breeding programs.

  • Research Article
  • 10.1186/s12864-025-12270-w
Genotype imputation from low-coverage WGS using haplotype reference panels in cultivated strawberry
  • Nov 19, 2025
  • BMC Genomics
  • Tim Koorevaar + 4 more

BackgroundTo implement high-throughput sequencing-based genotyping in a strawberry (Fragaria × ananassa) breeding program, we aimed to construct a haplotype reference panel and explore its utility through genotype dosage imputation of low-coverage (1×) sequencing data. Although genotyping by whole genome sequencing (WGS) provides high SNP density, its cost remains a limitation for large-scale application. Imputation from low coverage data using a reference panel offers a cost effective alternative, but this approach has not yet been optimized for allo-octoploid strawberry.ResultsTo reduce genotyping errors that limit phasing accuracy, we combined high sequencing depth (> 15×) with variant filtering based on average allele balance (AAB), linkage disequilibrium (LD), and Mendelian error rates (MER). Statistical phasing using SHAPEIT5 resulted in a mean switch error rate of 0.9%, with 50% of the genome covered by haplotype blocks of at least 654 kb (QHN50) without phase switches. To evaluate downstream imputation, samples from three genetically distinct populations (California, Florida, and HCFF) were downsampled to 1× and imputed using reference panels of varying size and composition (via GLIMPSE2). Both panel size and genetic diversity influenced imputation accuracy, with concordance rates ranging from 0.87 to 0.97 for the smallest panel and 0.94 to 0.98 for the largest, excluding three outliers.ConclusionsThese findings demonstrate that constructing a large, genetically diverse haplotype reference panel improves genotype dosage imputation from low-coverage sequencing data. However, high accuracy is still achievable with limited resources, making this a cost-efficient alternative to SNP arrays when adopting WGS-based genotyping in breeding programs. The strategy is broadly applicable to other crops where dense genotyping is needed but resources are limited. In such cases, sequencing approximately 70 genetically representative samples at ≥ 25× depth was sufficient in our study to build a reference panel suitable for imputation.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12864-025-12270-w.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers