Abstract
Despite the availability of large-scale sequencing data, long-range linkage disequilibrium (LRLD) has not been extensively studied. The theoretical aspects of LRLD estimates were studied to determine the best estimation method for the sequencing data of three different populations of African (AFR), European (EUR), and East-Asian (EAS) descent from the 1000 Genomes Project. Genome-wide LRLDs excluding centromeric regions revealed clear population specificity, presenting substantially more population-specific LRLDs than coincident LRLDs. Clear relationships between the functionalities of the regions in LRLDs denoted long-range interactions in the genome. The proportions of gene regions were increased in LRLD variants, and the coding sequence (CDS)-CDS LRLDs showed obvious functional similarities between genes in LRLDs. Application to theoretical case-control associations confirmed that the LRLDs in genome-wide association studies (GWASs) could contribute to false signals, although the impacts might not be severe in most cases. LRLDs with variants with functional similarity exist in the human genome indicating possible gene-gene interactions, and they differ depending on populations. Based on the current study, LRLDs should be examined in GWASs to identify true signals. More importantly, population specificity in LRLDs should be examined in relevant studies.
Highlights
Population-specific long-range linkage disequilibrium in the human genome and its influence on identifying common disease variants
Based on a traditional yet robust method, the current study identified the actual long-range linkage disequilibrium (LRLD) in three different human populations of the 1000 Genomes Project, and the possible impacts of LRLD on genome-wide association studies (GWASs) were examined
The current study revealed clear population differences in LRLDs, indicating the existence of population-specific long-range interactions
Summary
Population-specific long-range linkage disequilibrium in the human genome and its influence on identifying common disease variants. Despite the availability of large-scale sequencing data, long-range linkage disequilibrium (LRLD) has not been extensively studied. Recurrent severe bottlenecks can extend LD between distant genomic regions[4] Such impacts would appear as long chromosomal regions harboring pairs of both close and distant variants, and these effects will always be more substantial in close regions than in distant regions, as shown in a previous study on a bottleneck of the European population using LD over a 2-Mb region in chromosome 175. When the minor allele frequencies of one variant are very small relative to the sample size, the haplotype frequency at equilibrium is biased at one side and generates LD by chance.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have