Abstract

Whole genome and whole exome sequencing technologies play a very important role in the studies of the genetic aspects of the pathogenesis of various diseases. The ample use of genome-wide and exome-wide association study methodology (GWAS and EWAS) made it possible to identify a large number of genetic variants associated with diseases. This information is accumulated in the databases like GWAS central, GWAS catalog, OMIM, ClinVar, etc. Most of the variants identified by the GWAS technique are located in the noncoding regions of the human genome. According to the ENCODE project, the fraction of regions in the human genome potentially involved in transcriptional control is many times greater than the fraction of coding regions. Thus, genetic variation in noncoding regions of the genome can increase the susceptibility to diseases by disrupting various regulatory elements (promoters, enhancers, silencers, insulator regions, etc.). However, identification of the mechanisms of influence of pathogenic genetic variants on the diseases risk is difficult due to a wide variety of regulatory elements. The present review focuses on the molecular genetic mechanisms by which pathogenic genetic variants affect gene expression. At the same time, attention is concentrated on the transcriptional level of regulation as an initial step in the expression of any gene. A triggering event mediating the effect of a pathogenic genetic variant on the level of gene expression can be, for example, a change in the functional activity of transcription factor binding sites (TFBSs) or DNA methylation change, which, in turn, affects the functional activity of promoters or enhancers. Dissecting the regulatory roles of polymorphic loci have been impossible without close integration of modern experimental approaches with computer analysis of a growing wealth of genetic and biological data obtained using omics technologies. The review provides a brief description of a number of the most well-known public genomic information resources containing data obtained using omics technologies, including (1) resources that accumulate data on the chromatin states and the regions of transcription factor binding derived from ChIP-seq experiments; (2) resources containing data on genomic loci, for which allele-specific transcription factor binding was revealed based on ChIP-seq technology; (3) resources containing in silico predicted data on the potential impact of genetic variants on the transcription factor binding sites

Highlights

  • At present, largely due to the widespread use of the technology of genome-wide and exome-wide association study (GWAS and EWAS), a large number of polymorphisms associated with diseases have been identified

  • These resources present (1) the human genome ann­ otation (GENCODE); (2) genome variability in human po­ pulations (HapMap, 1000 Genomes Project, International Genome Sample Resource (IGSR), dbSNP); (3) genetic variants (GVs) associated with diseases (GWAS central, GWAS ca­talog, ClinVar, HGMD, OMIM, etc.); (4) modifications of the chromatin (ENCODE, NIH Roadmap Epigenomics Mapping Consortium); (5) expression quantitative trait loci (GTEx proj­ect, eQTL databases, exSNP, etc.); (6) profiling of transcription factor binding events by ChIP-seq (Cistrome Data Browser, GTRD, ReMap); (7) allele-specific binding of TFs, identified using ChIP-seq data in combination with the data on the genotypes of the studied cells

  • (1) transcripts of the TLR4 gene, displayed according to the GENCODE v32 release; (2) DNase I hypersensitivity peak clusters derived from assays in 95 cell types; (3) transcription factor binding derived from a large collection of ChIP-seq experiments performed by the ENCODE project; (4) levels of enrichment of the H3K27Ac histone mark across the genome as determined by a ChIP-seq assay on 7 cell lines from ENCODE (H3K27Ac is the acetylation of lysine 27 of the H3 histone protein, and it is often found near regulatory elements); (5) short genetic variants from dbSNP release 153

Read more

Summary

Introduction

Largely due to the widespread use of the technology of genome-wide and exome-wide association study (GWAS and EWAS), a large number of polymorphisms associated with diseases have been identified. To identify this functionally significant regulatory variant, the authors developed an integrated experimental computer method based on a combined analysis of heterogeneous big biological and genetic data, including: (1) data on allele-specific expression obtained from RNA-seq in combination with data on haplotypes; (2) expression quantitative trait loci (eQTL); (3) genomic distribution of DNAse I hypersensitive sites; (4) localization of ChIP-seq peaks from ENCODE and GEO databases; (5) localization of regulatory motives predicted by computer programs.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call