Abstract

Genome-wide association studies (GWAS) have emerged as one of the most powerful tools to define DNA sequence variants associated with common diseases or phenotypes. Massive phenotyping efforts coupled with inexpensive genotyping arrays have resulted in GWAS data on millions of individuals and an exponential growth in the number of publications reporting GWAS findings. At present, the largest GWAS that analyzed kidney function included information on 300,000 individuals and identified >100 distinct associations.1 However, translating GWAS discoveries into biological mechanisms has been difficult and slow. Because the vast majority (85%–95%) of common disease-associated variants are noncoding,2 it is not easy to link genetic variations to biological mechanisms. Only few loci have undergone rigorous analysis to define biological pathways that connect sequence variations to disease development. These mechanistic studies suggest a general model in which the causal variant is localized to a cell type–specific regulatory region (enhancer) that loops to a gene promoter and then alters transcription factor binding, resulting in a quantitative difference in gene expression (Figure 1).Figure 1.: Chromatin conformation capture (Hi-C) maps can identify long-range interactions in the genome and help to define genome-wide association study (GWAS) target genes. Variants identified in GWAS are localized to noncoding regions of the genome. Disease-causing variants are localized to cell type–specific regulatory regions. Such regions can be identified by DNase I hypersensitive sites sequencing and chromatin immunoprecipitation sequencing. Integration of epigenome profiles allows for genome-wide identification of variant-related regulatory elements, including active promoter (marked by H3K4me3) and enhancers (marked by H3K27ac). Furthermore, application of Hi-C enables direct capturing of long-range interactions between enhancer and promoter regions. DNase-seq; DNase I hypersensitive sites sequencing; RNA-seq, RNA sequencing.Several factors limit progress of translating sequence variations to biological mechanisms. First, hundreds or sometimes thousands of nucleotide polymorphisms are inherited together in large haplotype blocks.3 Because these variants are strongly linked, it is hard to identify the specific single causal nucleotide variant by performing association analysis. Second, identifying causal variants is also hindered by a lack of information on cell type–specific or tissue-specific epigenome and chromatin states that can highlight gene regulatory regions. In absence of causal variants, it is difficult to identify transcription factors and upstream regulators. For target gene identification about kidney-related traits and diseases, information on gene expression and genotype that is specific to human kidney tissue is essential. Cell type–specific gene expression data can also highlight specific disease-causing cell types. To narrow this large knowledge gap, the National Institutes of Health and European funding organizations have been supporting large-scale efforts to generate organ- and tissue-specific genotype, gene expression, and epigenome maps through consortia, such as Genotype Tissue Expression, Encyclopedia of DNA Elements, and International Human Epigenome Consortium among others. However, coverage of the human kidney in these large databases has been limited. To remediate this issue, the renal community has been active in collecting human kidney tissue samples for gene expression and epigenome analyses.4,5 In the last 2 years, researchers have made key advances in generating human kidney-specific expression quantitative trait loci (eQTL) information. By obtaining both genotype and gene expression information from human kidney tissue samples and using complex computational adjustments and statistical analysis, it is possible to filter out gene expression variability that is mostly the result of genotypic variations. Ko et al.6 were the first to analyze nearly 100 human kidney tissue samples obtained from individuals of European descent with no evidence of kidney disease. Using RNA sequencing to quantify gene expression in kidney cortex homogenates, they identified nearly 1800 genes with expression levels that are altered on the basis of nucleotide polymorphisms. Gillies et al.7 performed eQTL studies on microdissected human kidney glomeruli and tubulointerstitial tissue (n=180) obtained from patients of mixed ethnicities with nephrotic syndrome; using gene expression quantified by microarrays, they identified 1767 eGenes in kidney tubules and 894 eGenes in glomeruli, those levels are influenced by nucleotide variations. Qiu et al.8 also performed eQTL analysis on microdissected glomeruli and tubules using gene expression profiles by RNA sequencing. Their study used only kidney samples from individuals of European descent with no evidence of disease as defined by kidney function (eGFR) and structure analysis (fibrosis), or by cell composition studies. They identified about 4000–5000 genes with expression levels that varied on the basis of genetic polymorphisms observed in kidney samples. Integrating kidney eQTL studies with GWAS variants enables identification of novel disease-target genes. For example, >30 new likely kidney disease–target genes have been identified by combining the results from the CKDGen consortium and eQTL studies from Ko et al.6 and Qiu et al.8 These genetic variations were associated with not only kidney function but also, variations in expression of target genes in the kidney. Our group has observed an enrichment of these genes for proximal tubule expression and endolysosomal function. Using animal models, we showed that MANBA (a lysosomal enzyme) and DAB2 (an adaptor protein) play important roles in kidney disease development (Figure 1). A lack of cell type–specific epigenomic maps or maps of open chromatin for whole kidney, kidney compartments, and kidney cell types has been a key limitation in understanding kidney-specific genetic regulation in both health and disease. In this issue of the Journal of the American Society of Nephrology, a study by Sieber et al.9 takes the first step to fill this critically important gap using DNase I hypersensitive sites sequencing to generate open chromatin information specific to kidney. Although DNA is normally wrapped around nucleosomes, regions of DNA in the genome that are transcriptionally active are not covered by nucleosomes, and they are sensitive to low-concentration DNase I, allowing the generation of a “fingerprint” to identify promoters and enhancers. Using cells from human kidney glomeruli and tubuli after minimal culturing, Sieber et al.9 performed DNase I hypersensitive sites sequencing and identified regulatory regions in human glomeruli and tubuli. To further confirm the relationship between genetic variants and gene regulation, they generated chromatin conformation capture maps for those freshly isolated human glomeruli. Chromatin conformation capture can reveal long-range physical interactions between enhancer and promoter regions, enhancing our understanding of gene regulation and helping identification of target genes.10 These technically challenging experiments provide critical information to illuminate gene regulation in general, and they are key to understanding how GWAS variants lead to disease development. Indeed, the authors have been able to integrate the information with published GWAS and identify multiple putative kidney function–associated genes and loci. Importantly, the kidney regulatory annotation maps and eQTL maps seem to be converging. For example, Sieber et al.9 identified an open chromatin area at the DAB2 locus in the human kidney. They also identified important long-range interactions in this region, where the variant was associated with DAB2 expression. This region in the genome is one that previously had been identified in GWAS and eQTL studies (Figure 1) indicating that the nucleotide variants are associated with with kidney disease and regulate the expression of DAB2 in the kidney. Furthermore the role of DAB2 as kidney disease gene was confirmed in mouse genetic models.8 In summary, the study by Sieber et al.9 is a major step forward in generating open chromatin and chromatin interaction maps for samples of human kidney glomeruli and tubules. This type of information will be important for understanding gene regulation in the kidney and annotation of human genetic studies. Future research will focus on defining the cell type-specific epigenome and chromatin accessibility and interaction. Furthermore, integration of data from different sources—such as larger GWAS datasets, studies with better phenotyping information, larger and improved eQTL studies, and improved regulatory annotation via epigenome and chromatin mapping—will be essential to identify high-confidence causal genes. Finally, experimental validation via animal models will be needed to confirm that these “high-confidence” causal genes actually define “true” disease-causing genes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call