Bayesian Inference of Allele-Specific Gene Expression Indicates Abundant Cis-Regulatory Variation in Natural Flycatcher Populations
Polymorphism in cis-regulatory sequences can lead to different levels of expression for the two alleles of a gene, providing a starting point for the evolution of gene expression. Little is known about the genome-wide abundance of genetic variation in gene regulation in natural populations but analysis of allele-specific expression (ASE) provides a means for investigating such variation. We performed RNA-seq of multiple tissues from population samples of two closely related flycatcher species and developed a Bayesian algorithm that maximizes data usage by borrowing information from the whole data set and combines several SNPs per transcript to detect ASE. Of 2,576 transcripts analyzed in collared flycatcher, ASE was detected in 185 (7.2%) and a similar frequency was seen in the pied flycatcher. Transcripts with statistically significant ASE commonly showed the major allele in >90% of the reads, reflecting that power was highest when expression was heavily biased toward one of the alleles. This would suggest that the observed frequencies of ASE likely are underestimates. The proportion of ASE transcripts varied among tissues, being lowest in testis and highest in muscle. Individuals often showed ASE of particular transcripts in more than one tissue (73.4%), consistent with a genetic basis for regulation of gene expression. The results suggest that genetic variation in regulatory sequences commonly affects gene expression in natural populations and that it provides a seedbed for phenotypic evolution via divergence in gene expression.
- Research Article
1
- 10.1101/2024.08.13.607784
- Jan 15, 2025
- bioRxiv
Single-cell RNA-seq (scRNA-seq) is emerging as a powerful tool for understanding gene function across diverse cells. Recently, this has included the use of allele-specific expression (ASE) analysis to better understand how variation in the human genome affects RNA expression at the single-cell level. We reasoned that because intronic reads are more prevalent in single-nucleus RNA-Seq (snRNA-Seq), and introns are under lower purifying selection and thus enriched for genetic variants, that snRNA-seq should facilitate single-cell analysis of ASE. Here we demonstrate how experimental and computational choices can improve the results of allelic imbalance analysis. We explore how experimental choices, such as RNA source, read length, sequencing depth, genotyping, etc., impact the power of ASE-based methods. We developed a new suite of computational tools to process and analyze scRNA-seq and snRNA-seq for ASE. As hypothesized, we extracted more ASE information from reads in intronic regions than those in exonic regions and show how read length can be set to increase power. Additionally, hybrid selection improved our power to detect allelic imbalance in genes of interest. We also explored methods to recover allele-specific isoform expression levels from both long- and short-read snRNA-seq. To further investigate ASE in the context of human disease, we applied our methods to a Parkinson’s disease cohort of 94 individuals and show that ASE analysis had more power than eQTL analysis to identify significant SNP/gene pairs in our direct comparison of the two methods. Overall, we provide an end-to-end experimental and computational approach for future studies.
- Research Article
1
- 10.1038/s41598-024-73743-8
- Oct 5, 2024
- Scientific reports
Somatic copy number variations (CNVs), including abnormal chromosome numbers and structural changes leading to gain or loss of genetic material, play a crucial role in initiation and progression of cancer. CNVs are believed to cause gene dosage imbalances and modify cis-regulatory elements, leading to allelic expression imbalances in genes that influence cell division and thereby contribute to cancer development. However, the impact of CNVs on allelic gene expression in cancer remains unclear. Allele-specific expression (ASE) analysis, a potent method for investigating genome-wide allelic imbalance profiles in tumors, assesses the relative expression of two alleles using high-throughput sequencing data. However, many existing methods for gene-level ASE detection rely on only RNA sequencing data, which present challenges in interpreting the genetic mechanisms underlying ASE in cancer. To address this issue, we developed a robust framework that integrates allele-specific copy number calls into ASE calling algorithms by leveraging paired genome and transcriptome data from the same sample. This integration enhances the interpretability of the genetic mechanisms driving ASE, thereby facilitating the identification of driver events triggered by CNVs in cancer. In this study, we utilized BASE to conduct a comprehensive analysis of ASE in high hyperdiploid acute lymphoblastic leukemia (HeH ALL), a prevalent childhood malignancy characterized by gains of chromosomes X, 4, 6, 10, 14, 17, 18, and 21. Our analysis unveiled the comprehensive ASE landscape in HeH ALL. Through a multi-perspective examination of HeH ASEs, we offer a systematic understanding of how CNVs impact ASE in HeH, providing valuable insights to guide ASE studies in cancer.
- Research Article
57
- 10.1093/nar/gkw1076
- Nov 29, 2016
- Nucleic Acids Research
Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.
- Research Article
39
- 10.1186/s12864-017-4354-6
- Dec 1, 2017
- BMC Genomics
BackgroundEfforts to improve sustainability in livestock production systems have focused on two objectives: investigating the genetic control of immune function as it pertains to robustness and disease resistance, and finding predictive markers for use in breeding programs. In this context, the peripheral blood transcriptome represents an important source of biological information about an individual’s health and immunological status, and has been proposed for use as an intermediate phenotype to measure immune capacity. The objective of this work was to study the genetic architecture of variation in gene expression in the blood of healthy young pigs using two approaches: an expression genome-wide association study (eGWAS) and allele-specific expression (ASE) analysis.ResultsThe blood transcriptomes of 60-day-old Large White pigs were analyzed by expression microarrays for eGWAS (242 animals) and by RNA-Seq for ASE analysis (38 animals). Using eGWAS, the expression levels of 1901 genes were found to be associated with expression quantitative trait loci (eQTLs). We recovered 2839 local and 1752 distant associations (Single Nucleotide Polymorphism or SNP located less or more than 1 Mb from expression probe, respectively). ASE analyses confirmed the extensive cis-regulation of gene transcription in blood, and revealed allelic imbalance in 2286 SNPs, which affected 763 genes. eQTLs and ASE-genes were widely distributed on all chromosomes. By analyzing mutually overlapping eGWAS results, we were able to describe putative regulatory networks, which were further refined using ASE data. At the functional level, genes with genetically controlled expression that were detected by eGWAS and/or ASE analyses were significantly enriched in biological processes related to RNA processing and immune function. Indeed, numerous distant and local regulatory relationships were detected within the major histocompatibility complex region on chromosome 7, revealing ASE for most class I and II genes.ConclusionsThis study represents, to the best of our knowledge, the first genome-wide map of the genetic control of gene expression in porcine peripheral blood. These results represent an interesting resource for the identification of genetic markers and blood biomarkers associated with variations in immunity traits in pigs, as well as any other complex traits for which blood is an appropriate surrogate tissue.
- Research Article
- 10.1186/s12864-025-12137-0
- Oct 30, 2025
- BMC Genomics
BackgroundGenetic and epigenetic perturbation of cis-regulatory sequences can shift patterns of gene expression and result in novel phenotypes. Phased genome assemblies now enable the local dissection of linkages between cis-regulatory sequences, including their epigenetic state, and allele-specific gene expression to further characterize gene regulation and resulting phenotypes in heterozygous genomes.ResultsWe assembled a locally phased genome for a mandarin hybrid named ‘Fairchild’ to explore the molecular signatures of allele-specific gene expression. With local genome phasing, genes with allele-specific expression were paired with haplotype-specific chromatin states, including levels of chromatin accessibility, histone modifications, and DNA methylation. We found that 30% of variation in allele-specific expression could be attributed to haplotype associated factors, with allelic levels of chromatin accessibility and three histone modifications in gene bodies having the most influence. Structural variants in promoter regions were also associated with allele-specific expression, including specific enrichments of hAT and MULE-MuDR DNA transposon sequences. Integration of haplotype-resolved genetic and epigenetic landscapes with high-throughput phenotypic analysis of fruit traits in a panel of 154 accessions with mandarin and pummelo ancestry revealed that trait-associated variants were enriched in regions of open chromatin. Mining of trait-associated variants uncovered a Gypsy retrotransposon insertion in a gene that regulates potassium transport and may contribute to the reduction in fruit size that is observed in mandarins.ConclusionsUsing a locally phased assembly of a heterozygous cultivar of citrus, we dissected the interplay between genetic variants and molecular phenotypes to reveal cis-regulatory sequences with potential functional effects on phenotypes relevant for genetic improvement.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12864-025-12137-0.
- Research Article
21
- 10.1016/j.cels.2020.01.002
- Feb 1, 2020
- Cell Systems
Differential Allele-Specific Expression Uncovers Breast Cancer Genes Dysregulated by Cis Noncoding Mutations.
- Research Article
9
- 10.1038/s41598-021-83459-8
- Feb 17, 2021
- Scientific Reports
Differential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.
- Research Article
- 10.1158/1538-7445.am2019-1584
- Jul 1, 2019
- Cancer Research
Background: Genome-wide association study (GWAS) have identified over 45 susceptibility loci for lung cancer; many studies including our own group, have focused on low-frequency and rare coding variants using fine mapping and exome sequencing. This strategy, however, has met with limited success as about 90% of GWAS hits are noncoding and act primarily through altering transcriptional regulation in an allele-specific manner. The RNA-Seq based allele-specific expression (ASE) analysis affords an innovative approach to study preferential expression of an allele in direct relationship to its genotype, providing information on cis-regulatory effects for the expression of putative genes. However currently, there are no lung cancer studies that have rigorously evaluated the ASE variation in lung tumor and adjacent tissues. Methods: Leveraging The Cancer Genome Atlas (TCGA) resource, we performed transcriptomic-wide ASE analysis using existing RNA-Seq datasets of paired tumor and adjacent tissues from 54 lung adenocarcinoma patients. We first quantified the RNA read counts of Referent and Alternate alleles of heterozygous variants, then evaluated the allelic imbalance on a per-sample basis using Beta-binomial test, and explored the differential ASE between tumor and adjacent tissues using paired Wilcoxon test. Functional regulatory consequences were generated from Ensembl Variant Effect Predictor. Results: We identified total 208 significant ASEs, including 35 tissue-specific (only in tumor or only in adjacent), 28 sharing, and 145 differential variants. Of the 208 candidates, 41 were from the human leukocyte antigen (HLA) locus (primary DQA2, DQB1, DRB1, H and J), 26 were from the immunoglobulin (IG) superfamily (primary IGH, IGL, IGK and F11R). About 80% candidates were noncoding (mostly in 5’ and 3’ untranslated regions) and with regulatory features (21 promoter, seven enhancer, seven open chromatin region, two induce nonsense-mediated mRNA decay, one CTCF-binding site, and one transcription factor binding site). Other top genes included MDM2, APOL1, and CTSB. Pathway analyses revealed 27 genes involved in immune response pathway, and 12 genes involved in HLA antigen processing and presentation pathway. Conclusion: This study is the first transcriptomics ASE analysis in lung adenocarcinoma. The key somatic cis-regulatory ASE variants identified from this study, especially immunogenic allelic variations from HLA and IG genes, could be used for identifying high-risk individuals for targeted lung cancer checkpoint blockade and related immunotherapies. Citation Format: Yanhong Liu, Spiridon Tsavachidis, Farrah Kheradmand, Margaret R. Spitz, Chris Amos. Transcriptome analysis links immune genes allelic expression imbalances to lung cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1584.
- Research Article
48
- 10.1186/s12711-020-00579-x
- Oct 9, 2020
- Genetics Selection Evolution
BackgroundGenetic analysis of gene expression level is a promising approach for characterizing candidate genes that are involved in complex economic traits such as meat quality. In the present study, we conducted expression quantitative trait loci (eQTL) and allele-specific expression (ASE) analyses based on RNA-sequencing (RNAseq) data from the longissimus muscle of 189 Duroc × Luchuan crossed pigs in order to identify some candidate genes for meat quality traits.ResultsUsing a genome-wide association study based on a mixed linear model, we identified 7192 cis-eQTL corresponding to 2098 cis-genes (p ≤ 1.33e-3, FDR ≤ 0.05) and 6400 trans-eQTL corresponding to 863 trans-genes (p ≤ 1.13e-6, FDR ≤ 0.05). ASE analysis using RNAseq SNPs identified 9815 significant ASE-SNPs in 2253 unique genes. Integrative analysis between the cis-eQTL and ASE target genes identified 540 common genes, including 33 genes with expression levels that were correlated with at least one meat quality trait. Among these 540 common genes, 63 have been reported previously as candidate genes for meat quality traits, such as PHKG1 (q-value = 1.67e-6 for the leading SNP in the cis-eQTL analysis), NUDT7 (q-value = 5.67e-13), FADS2 (q-value = 8.44e-5), and DGAT2 (q-value = 1.24e-3).ConclusionsThe present study confirmed several previously published candidate genes and identified some novel candidate genes for meat quality traits via eQTL and ASE analyses, which will be useful to prioritize candidate genes in further studies.
- Research Article
4
- 10.1111/1755-0998.12909
- Jun 20, 2018
- Molecular Ecology Resources
Variation in gene expression is believed to make a significant contribution to phenotypic diversity and divergence. The analysis of allele-specific expression (ASE) can reveal important insights into gene expression regulation. We developed a novel method called RPASE (Read-backed Phasing-based ASE detection) to test for genes that show ASE. With mapped RNA-seq data from a single individual and a list of SNPs from the same individual as the only input, RPASE is capable of aggregating information across multiple dependent SNPs and producing individual-based gene-level tests for ASE. RPASE performs well in simulations and comparisons. We applied RPASE to multiple bird species and found a potentially rich landscape of ASE.
- Research Article
11
- 10.3390/ijms21062117
- Mar 19, 2020
- International journal of molecular sciences
Cytokinins play important roles in the growth and development of plants. Physiological and photosynthetic characteristics are common indicators to measure the growth and development in plants. However, few reports have described the molecular mechanisms of physiological and photosynthetic changes in response to cytokinin, particularly in woody plants. DNA methylation is an essential epigenetic modification that dynamically regulates gene expression in response to the external environment. In this study, we examined genome-wide DNA methylation variation and transcriptional variation in poplar (Populus tomentosa) after short-term treatment with the synthetic cytokinin 6-benzylaminopurine (6-BA). We identified 460 significantly differentially methylated regions (DMRs) in response to 6-BA treatment. Transcriptome analysis showed that 339 protein-coding genes, 262 long non-coding RNAs (lncRNAs), and 15,793 24-nt small interfering RNAs (siRNAs) were differentially expressed under 6-BA treatment. Among these, 79% were differentially expressed between alleles in P. tomentosa, and 102,819 allele-specific expression (ASE) loci in 19,200 genes were detected showing differences in ASE levels after 6-BA treatment. Combined DNA methylation and gene expression analysis demonstrated that DNA methylation plays an important role in regulating allele-specific gene expression. To further investigate the relationship between these 6-BA-responsive genes and phenotypic variation, we performed SNP analysis of 460 6-BA-responsive DMRs via re-sequencing using a natural population of P. tomentosa, and we identified 206 SNPs that were significantly associated with growth and wood properties. Association analysis indicated that 53% of loci with allele-specific expression had primarily dominant effects on poplar traits. Our comprehensive analyses of P. tomentosa DNA methylation and the regulation of allele-specific gene expression suggest that DNA methylation is an important regulator of imbalanced expression between allelic loci.
- Research Article
92
- 10.1101/gr.083931.108
- Nov 7, 2008
- Genome Research
To identify genes that are regulated by cis-acting functional elements in acute lymphoblastic leukemia (ALL) we determined the allele-specific expression (ASE) levels of 2, 529 genes by genotyping a genome-wide panel of single nucleotide polymorphisms in RNA and DNA from bone marrow and blood samples of 197 children with ALL. Using a reproducible, quantitative genotyping method and stringent criteria for scoring ASE, we found that 16% of the analyzed genes display ASE in multiple ALL cell samples. For most of the genes, the level of ASE varied largely between the samples, from 1.4-fold overexpression of one allele to apparent monoallelic expression. For genes exhibiting ASE, 55% displayed bidirectional ASE in which overexpression of either of the two SNP alleles occurred. For bidirectional ASE we also observed overall higher levels of ASE and correlation with the methylation level of these sites. Our results demonstrate that CpG site methylation is one of the factors that regulates gene expression in ALL cells.
- Research Article
15
- 10.1093/hmg/ddy027
- Jan 15, 2018
- Human Molecular Genetics
Transcriptomic diversity across human populations reflects differential regulatory mechanisms. Allelic-imbalanced gene expression is a genetic regulatory mechanism that contributes to human phenotypic variation. To systematically investigate genome-wide allele-specific expression (ASE), we analyzed RNA-Seq data from European and African populations provided by the Geuvadis project. We identified 11 sites in 8 genes showing ASE in both Europeans and Africans, and 9 sites in 9 genes showing population-specific ASE, including both novel and known ASE signals. Notably, the top signal of differentiated ASE between inter-continental populations was observed in DNAJC15, of which the derived allele of rs12015, a single nucleotide polymorphism (SNP), showed significantly higher expression than did the ancestral allele specifically in European individuals. We identified a unique haplotype of DNAJC15, where a few SNPs highly differentiated between European and African populations were strongly linked to sites with high ASE. Among these, SNP rs17553284 affected the binding of several transcription factors as well as the genotype-dependent expression of DNAJC15. Therefore, we speculated that rs17553284 could be a regulatory causal variant that mediates the ASE of rs12015. We found several variations in ASE between intercontinental populations. The highly differentiated ASE genes identified here may implicate in the phenotypic variations among populations that are both evolutionarily and medically important.
- Research Article
52
- 10.1093/gbe/evx072
- May 1, 2017
- Genome Biology and Evolution
Gene regulation is a ubiquitous mechanism by which organisms respond to their environment. While organisms are often found to be adapted to the environments they experience, the role of gene regulation in environmental adaptation is not often known. In this study, we examine divergence in cis-regulatory effects between two Saccharomycesspecies, S. cerevisiaeand S. uvarum, that have substantially diverged in their thermal growth profile. We measured allele specific expression (ASE) in the species’ hybrid at three temperatures, the highest of which is lethal to S. uvarumbut not the hybrid or S. cerevisiae. We find that S. uvarumalleles can be expressed at the same level as S. cerevisiaealleles at high temperature and most cis-acting differences in gene expression are not dependent on temperature. While a small set of 136 genes show temperature-dependent ASE, we find no indication that signatures of directional cis-regulatory evolution are associated with temperature. Within promoter regions we find binding sites enriched upstream of temperature responsive genes, but only weak correlations between binding site and expression divergence. Our results indicate that temperature divergence between S. cerevisiaeand S. uvarumhas not caused widespread divergence in cis-regulatory activity, but point to a small subset of genes where the species’ alleles show differences in magnitude or opposite responses to temperature. The difficulty of explaining divergence in cis-regulatory sequences with models of transcription factor binding sites and nucleosome positioning highlights the importance of identifying mutations that underlie cis-regulatory divergence between species.
- Abstract
- 10.1182/blood-2021-144808
- Nov 5, 2021
- Blood
Genome-Wide Analysis of Allele-Specific Expression Genes in Pediatric B-Cell Precursor Acute Lymphoblastic Leukemia