Allele-specific expression analysis for complex genetic phenotypes applied to a unique dilated cardiomyopathy cohort
Allele-specific expression (ASE) analysis detects the relative abundance of alleles at heterozygous loci as a proxy for cis-regulatory variation, which affects the personal transcriptome and proteome. This study describes the development and application of an ASE analysis pipeline on a unique cohort of 87 well phenotyped and RNA sequenced patients from the Maastricht Cardiomyopathy Registry with dilated cardiomyopathy (DCM), a complex genetic disorder with a remaining gap in explained heritability. Regulatory processes for which ASE is a proxy might explain this gap. We found an overrepresentation of known DCM-associated genes among the significant results across the cohort. In addition, we were able to find genes of interest that have not been associated with DCM through conventional methods such as genome-wide association or differential gene expression studies. The pipeline offers RNA sequencing data processing, individual and population level ASE analyses as well as group comparisons and several intuitive visualizations such as Manhattan plots and protein–protein interaction networks. With this pipeline, we found evidence supporting the case that cis-regulatory variation contributes to the phenotypic heterogeneity of DCM. Additionally, our results highlight that ASE analysis offers an additional layer to conventional genomic and transcriptomic analyses for candidate gene identification and biological insight.
- Research Article
- 10.1158/1538-7445.am2019-1584
- Jul 1, 2019
- Cancer Research
Background: Genome-wide association study (GWAS) have identified over 45 susceptibility loci for lung cancer; many studies including our own group, have focused on low-frequency and rare coding variants using fine mapping and exome sequencing. This strategy, however, has met with limited success as about 90% of GWAS hits are noncoding and act primarily through altering transcriptional regulation in an allele-specific manner. The RNA-Seq based allele-specific expression (ASE) analysis affords an innovative approach to study preferential expression of an allele in direct relationship to its genotype, providing information on cis-regulatory effects for the expression of putative genes. However currently, there are no lung cancer studies that have rigorously evaluated the ASE variation in lung tumor and adjacent tissues. Methods: Leveraging The Cancer Genome Atlas (TCGA) resource, we performed transcriptomic-wide ASE analysis using existing RNA-Seq datasets of paired tumor and adjacent tissues from 54 lung adenocarcinoma patients. We first quantified the RNA read counts of Referent and Alternate alleles of heterozygous variants, then evaluated the allelic imbalance on a per-sample basis using Beta-binomial test, and explored the differential ASE between tumor and adjacent tissues using paired Wilcoxon test. Functional regulatory consequences were generated from Ensembl Variant Effect Predictor. Results: We identified total 208 significant ASEs, including 35 tissue-specific (only in tumor or only in adjacent), 28 sharing, and 145 differential variants. Of the 208 candidates, 41 were from the human leukocyte antigen (HLA) locus (primary DQA2, DQB1, DRB1, H and J), 26 were from the immunoglobulin (IG) superfamily (primary IGH, IGL, IGK and F11R). About 80% candidates were noncoding (mostly in 5’ and 3’ untranslated regions) and with regulatory features (21 promoter, seven enhancer, seven open chromatin region, two induce nonsense-mediated mRNA decay, one CTCF-binding site, and one transcription factor binding site). Other top genes included MDM2, APOL1, and CTSB. Pathway analyses revealed 27 genes involved in immune response pathway, and 12 genes involved in HLA antigen processing and presentation pathway. Conclusion: This study is the first transcriptomics ASE analysis in lung adenocarcinoma. The key somatic cis-regulatory ASE variants identified from this study, especially immunogenic allelic variations from HLA and IG genes, could be used for identifying high-risk individuals for targeted lung cancer checkpoint blockade and related immunotherapies. Citation Format: Yanhong Liu, Spiridon Tsavachidis, Farrah Kheradmand, Margaret R. Spitz, Chris Amos. Transcriptome analysis links immune genes allelic expression imbalances to lung cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1584.
- Research Article
48
- 10.1186/s12711-020-00579-x
- Oct 9, 2020
- Genetics Selection Evolution
BackgroundGenetic analysis of gene expression level is a promising approach for characterizing candidate genes that are involved in complex economic traits such as meat quality. In the present study, we conducted expression quantitative trait loci (eQTL) and allele-specific expression (ASE) analyses based on RNA-sequencing (RNAseq) data from the longissimus muscle of 189 Duroc × Luchuan crossed pigs in order to identify some candidate genes for meat quality traits.ResultsUsing a genome-wide association study based on a mixed linear model, we identified 7192 cis-eQTL corresponding to 2098 cis-genes (p ≤ 1.33e-3, FDR ≤ 0.05) and 6400 trans-eQTL corresponding to 863 trans-genes (p ≤ 1.13e-6, FDR ≤ 0.05). ASE analysis using RNAseq SNPs identified 9815 significant ASE-SNPs in 2253 unique genes. Integrative analysis between the cis-eQTL and ASE target genes identified 540 common genes, including 33 genes with expression levels that were correlated with at least one meat quality trait. Among these 540 common genes, 63 have been reported previously as candidate genes for meat quality traits, such as PHKG1 (q-value = 1.67e-6 for the leading SNP in the cis-eQTL analysis), NUDT7 (q-value = 5.67e-13), FADS2 (q-value = 8.44e-5), and DGAT2 (q-value = 1.24e-3).ConclusionsThe present study confirmed several previously published candidate genes and identified some novel candidate genes for meat quality traits via eQTL and ASE analyses, which will be useful to prioritize candidate genes in further studies.
- Research Article
1
- 10.1101/2024.08.13.607784
- Jan 15, 2025
- bioRxiv
Single-cell RNA-seq (scRNA-seq) is emerging as a powerful tool for understanding gene function across diverse cells. Recently, this has included the use of allele-specific expression (ASE) analysis to better understand how variation in the human genome affects RNA expression at the single-cell level. We reasoned that because intronic reads are more prevalent in single-nucleus RNA-Seq (snRNA-Seq), and introns are under lower purifying selection and thus enriched for genetic variants, that snRNA-seq should facilitate single-cell analysis of ASE. Here we demonstrate how experimental and computational choices can improve the results of allelic imbalance analysis. We explore how experimental choices, such as RNA source, read length, sequencing depth, genotyping, etc., impact the power of ASE-based methods. We developed a new suite of computational tools to process and analyze scRNA-seq and snRNA-seq for ASE. As hypothesized, we extracted more ASE information from reads in intronic regions than those in exonic regions and show how read length can be set to increase power. Additionally, hybrid selection improved our power to detect allelic imbalance in genes of interest. We also explored methods to recover allele-specific isoform expression levels from both long- and short-read snRNA-seq. To further investigate ASE in the context of human disease, we applied our methods to a Parkinson’s disease cohort of 94 individuals and show that ASE analysis had more power than eQTL analysis to identify significant SNP/gene pairs in our direct comparison of the two methods. Overall, we provide an end-to-end experimental and computational approach for future studies.
- Research Article
7
- 10.1371/journal.pone.0316046
- Dec 27, 2024
- PloS one
Different sheep breeds show distinct phenotypic plasticity in fat deposition in the tails. The genetic background underlying fat deposition in the tail of sheep is complex, multifactorial, and may involve allele-specific expression (ASE) mechanism to modulate allelic expression. ASE is a common phenomenon in mammals and refers to allelic imbalanced expression modified by cis-regulatory genetic variants that can be observed at heterozygous loci. Therefore, regulatory processes behind the fat-tail formation in sheep may be to some extent explained by cis- regulatory variants, through ASE mechanism, which was investigated in the present study. An RNA-Seq-based variant calling was applied to perform genome-wide survey of ASE genes using 45 samples from seven independent studies comparing the transcriptome of fat-tail tissue between fat- and thin-tailed sheep breeds. Using a rigorous computational pipeline, 115 differential ASE genes were identified, which were narrowed down to four genes (LPL, SOD3, TCP1 and LRPAP1) for being detected in at least two studies. Functional analysis revealed that the ASE genes were mainly involved in fat metabolism. Of these, LPL was of greater importance, as 1) observed in five studies, 2) reported as ASE gene in the previous studies and 3) with a known role in fat deposition. Our findings implied that complex physiological traits, like fat-tail formation, can be better explained by considering various genetic mechanisms, which can be more finely mapped through ASE analyses. The insights gained in this study indicate that biallelic expression may not be a common mechanism in sheep fat-tail development. Hence, allelic imbalance of the fat deposition-related genes can be considered a novel layer of information for future research on genetic improvement and increased efficiency in sheep breeding programs.
- Conference Article
- 10.3920/978-90-8686-940-4_496
- Dec 31, 2022
Allele-specific expression (ASE) analysis improves the understanding of transcription’s cis-regulation. Herein, we used imputed SNPs along with RNA-Seq data from the Longissiumus thoracis muscle of 190 Nelore steers to identify functional cis-regulatory variants from ASE analysis. Using a Binomial Test, we identified 38,177 SNPs in ASE regions (ASE SNPs; FDR ≤0.05). We then searched for aseQTLs (SNPs potentially regulating the ASE) by comparing their heterozygosity to the measured allelic ratio under a Wilcoxon Rank Sum test. We identified 21,543 aseQTLs potentially regulating a total of 430 ASE SNPs (FDR ≤0.05). Based on a linear model, ASE SNPs and aseQTLs were associated with transcript abundance. We identified 3,333 SNPs acting as cis-eQTLs (FDR≤0.05). Results were integrated with previous ASE, functional regions, and meat quality-related differentially expressed genes data. This study described novel SNPs potentially regulating the transcription of genes that may affect beef traits.
- Research Article
- 10.1158/1538-8514.synthleth-b07
- Oct 1, 2017
- Molecular Cancer Therapeutics
In recent years, large-scale international studies have provide comprehensive catalogues of genomic alterations in cancers including Esophageal Squamous Cell Cancer(ESCC). They revealed that some gene associated with cell cycle/apoptosis pathway, NOTCH pathway, WNT pathway, such as TP53 and NOTCH1, harbored genetic abnormalities frequently. As the next step clinical sequencing studies are starting to evaluate efficacy of using targeted agents to patients with specific molecular aberrations. We performed exome sequencing and RNA sequencing for 25 Japanese patients with esophageal squamous cell carcinoma (ESCC) to provide a comprehensive catalogue of genomic abnormalities in ESCC and found TP53 and ZNF750 significantly mutated genes. Additionally, we performed allele specific expression analysis of TP53, integrating mRNA sequencing data into the information of genomic abnormality. This analysis revealed that levels of expression changes depending on mutation types and nearly mono-allelic expression of TP53 was a common signature of ESCC patients with somatic mutations. And pattern of mono-allelic expression was dependent on mutation types. We expanded this analysis to all genes with somatic SNV mutations and revealed that mutant allele specific expression was observed in other genes including ZNF750, and many of them were belonged to cancer pathway in KEGG database. About TP53, our investigation might provide better understanding of the involvement of somatic mutations. And fluctuations in transcriptional regulation of TP53 could be predicted based on type of somatic mutation. In addition to this, analysis of allele specific expression suggested that not only somatic mutation of DNA, but also mutant allele expression should be considered to understand cancer genetic pathophysiology better and build more effective therapeutic strategies. Citation Format: Masahiko Takahashi, Hirofumi Nakaoka, Yasunori Akutsu, Naoyuki Hanari, Kentaro Murakami, Masayuki Kano, Yasunori Matsumoto, Ryota Otsuka, Nobufumi Sekino, Masaya Yokoyama, Itsuro Inoue, Hisahiro Matsubara. Analysis of allele specific expression in esophageal squamous cell carcinoma with combination of exome sequencing and mRNA Sequencing [abstract]. In: Proceedings of the AACR Precision Medicine Series: Opportunities and Challenges of Exploiting Synthetic Lethality in Cancer; Jan 4-7, 2017; San Diego, CA. Philadelphia (PA): AACR; Mol Cancer Ther 2017;16(10 Suppl):Abstract nr B07.
- Research Article
- 10.1186/s13059-026-04062-6
- Apr 11, 2026
- Genome biology
Combining allele-specific expression (ASE) analysis with single-cell RNA-seq can elucidate how genomic variation affects RNA expression at the single-cell level. We explore how experimental and computational choices impact the power of ASE-based methods and develop a suite of single-cell ASE computational tools. With single-nucleus RNA-Seq, we extract more ASE information from reads in intronic than exonic regions. We show how read length can increase power and that hybrid selection improves power to detect ASE in targeted genes. We apply our methods to a Parkinson's disease cohort and show that ASE analysis has more power than eQTL analysis.
- Research Article
23
- 10.1186/s12864-021-08141-9
- Nov 8, 2021
- BMC Genomics
BackgroundIntramuscular fat (IMF) content is a determining factor for meat taste. The Luchuan pig is a fat-type local breed in southern China that is famous for its desirable meat quality due to high IMF, however, the crossbred offspring of Luchuan sows and Duroc boars displayed within-population variation on meat quality, and the reason remains unknown.ResultsIn the present study, we identified 212 IMF-correlated genes (FDR ≤ 0.01) using correlation analysis between gene expression level and the value of IMF content. The IMF-correlated genes were significantly enriched in the processes of lipid metabolism and mitochondrial energy metabolism, as well as the AMPK/PPAR signaling pathway. From the IMF-correlated genes, we identified 99 genes associated with expression quantitative trait locus (eQTL) or allele-specific expression (ASE) signals, including 21 genes identified by both cis-eQTL and ASE analyses and 12 genes identified by trans-eQTL analysis. Genome-wide association study (GWAS) of IMF identified a significant QTL on SSC14 (p-value = 2.51E−7), and the nearest IMF-correlated gene SFXN4 (r = 0.28, FDR = 4.00E−4) was proposed as the candidate gene. Furthermore, we highlighted another three novel IMF candidate genes, namely AGT, EMG1, and PCTP, by integrated analysis of GWAS, eQTL, and IMF-gene correlation analysis.ConclusionsThe AMPK/PPAR signaling pathway together with the processes of lipid and mitochondrial energy metabolism plays a vital role in regulating porcine IMF content. Trait correlated expression combined with eQTL and ASE analysis highlighted a priority list of genes, which compensated for the shortcoming of GWAS, thereby accelerating the mining of causal genes of IMF.
- Research Article
10
- 10.1590/0001-3765202120191453
- Jan 1, 2021
- Anais da Academia Brasileira de Ciências
In the current study, allele specific expression analysis was performed in two subspecies cows (Bos taurus and Bos indicus) at SNP and gene levels. RNA-Seq data of 21,078,477 and 20940063 paired end reads from pooling of whole blood samples (Leukocyte) from 40 US Holstein (Bos Taurus) and 45 Cholistani cows (Bos indicus) obtained from SRA database in NCBI. Quality control and trimming of row RNA-Seq data were processed by FASTQC and Trimmomatic softwares. The transcriptome was assembled by TopHat2 software in two cow's population by aligning and mapping the RNA-Seq reads on bovine reference genome. The SNPs were discovered by Samtools software and ASE analysis was performed by Chi-square test. Results showed that 50183 and 137954 SNPs were discovered on the assembled transcriptome of Holstein and Cholistani cow samples, respectively, and 15308 SNPs were common in both breeds. 10158 SNPs from 50183 (20%) in Holstein and 31523 SNPs from 137954 (23%) in Cholistani cows were identified as ASE-SNPs. Reference allele and alternative allele count in Holstein and Cholistani cows were 3041 and 7155, respectively. Among 131 discovered SNPs in 41 genes with different expression in Holstein and Cholistani cows, 31 ASE-SNPs (5 in Holstein; 26 in Cholistani cows) were discovered.
- Research Article
26
- 10.1093/gbe/evx080
- May 1, 2017
- Genome Biology and Evolution
Polymorphism in cis-regulatory sequences can lead to different levels of expression for the two alleles of a gene, providing a starting point for the evolution of gene expression. Little is known about the genome-wide abundance of genetic variation in gene regulation in natural populations but analysis of allele-specific expression (ASE) provides a means for investigating such variation. We performed RNA-seq of multiple tissues from population samples of two closely related flycatcher species and developed a Bayesian algorithm that maximizes data usage by borrowing information from the whole data set and combines several SNPs per transcript to detect ASE. Of 2,576 transcripts analyzed in collared flycatcher, ASE was detected in 185 (7.2%) and a similar frequency was seen in the pied flycatcher. Transcripts with statistically significant ASE commonly showed the major allele in >90% of the reads, reflecting that power was highest when expression was heavily biased toward one of the alleles. This would suggest that the observed frequencies of ASE likely are underestimates. The proportion of ASE transcripts varied among tissues, being lowest in testis and highest in muscle. Individuals often showed ASE of particular transcripts in more than one tissue (73.4%), consistent with a genetic basis for regulation of gene expression. The results suggest that genetic variation in regulatory sequences commonly affects gene expression in natural populations and that it provides a seedbed for phenotypic evolution via divergence in gene expression.
- Research Article
66
- 10.1073/pnas.1612561114
- Jan 17, 2017
- Proceedings of the National Academy of Sciences
Understanding the causes of cis-regulatory variation is a long-standing aim in evolutionary biology. Although cis-regulatory variation has long been considered important for adaptation, we still have a limited understanding of the selective importance and genomic determinants of standing cis-regulatory variation. To address these questions, we studied the prevalence, genomic determinants, and selective forces shaping cis-regulatory variation in the outcrossing plant Capsella grandiflora We first identified a set of 1,010 genes with common cis-regulatory variation using analyses of allele-specific expression (ASE). Population genomic analyses of whole-genome sequences from 32 individuals showed that genes with common cis-regulatory variation (i) are under weaker purifying selection and (ii) undergo less frequent positive selection than other genes. We further identified genomic determinants of cis-regulatory variation. Gene body methylation (gbM) was a major factor constraining cis-regulatory variation, whereas presence of nearby transposable elements (TEs) and tissue specificity of expression increased the odds of ASE. Our results suggest that most common cis-regulatory variation in C. grandiflora is under weak purifying selection, and that gene-specific functional constraints are more important for the maintenance of cis-regulatory variation than genome-scale variation in the intensity of selection. Our results agree with previous findings that suggest TE silencing affects nearby gene expression, and provide evidence for a link between gbM and cis-regulatory constraint, possibly reflecting greater dosage sensitivity of body-methylated genes. Given the extensive conservation of gbM in flowering plants, this suggests that gbM could be an important predictor of cis-regulatory variation in a wide range of plant species.
- Research Article
9
- 10.1038/s41598-021-83459-8
- Feb 17, 2021
- Scientific Reports
Differential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.
- Research Article
39
- 10.1186/s12864-017-4354-6
- Dec 1, 2017
- BMC Genomics
BackgroundEfforts to improve sustainability in livestock production systems have focused on two objectives: investigating the genetic control of immune function as it pertains to robustness and disease resistance, and finding predictive markers for use in breeding programs. In this context, the peripheral blood transcriptome represents an important source of biological information about an individual’s health and immunological status, and has been proposed for use as an intermediate phenotype to measure immune capacity. The objective of this work was to study the genetic architecture of variation in gene expression in the blood of healthy young pigs using two approaches: an expression genome-wide association study (eGWAS) and allele-specific expression (ASE) analysis.ResultsThe blood transcriptomes of 60-day-old Large White pigs were analyzed by expression microarrays for eGWAS (242 animals) and by RNA-Seq for ASE analysis (38 animals). Using eGWAS, the expression levels of 1901 genes were found to be associated with expression quantitative trait loci (eQTLs). We recovered 2839 local and 1752 distant associations (Single Nucleotide Polymorphism or SNP located less or more than 1 Mb from expression probe, respectively). ASE analyses confirmed the extensive cis-regulation of gene transcription in blood, and revealed allelic imbalance in 2286 SNPs, which affected 763 genes. eQTLs and ASE-genes were widely distributed on all chromosomes. By analyzing mutually overlapping eGWAS results, we were able to describe putative regulatory networks, which were further refined using ASE data. At the functional level, genes with genetically controlled expression that were detected by eGWAS and/or ASE analyses were significantly enriched in biological processes related to RNA processing and immune function. Indeed, numerous distant and local regulatory relationships were detected within the major histocompatibility complex region on chromosome 7, revealing ASE for most class I and II genes.ConclusionsThis study represents, to the best of our knowledge, the first genome-wide map of the genetic control of gene expression in porcine peripheral blood. These results represent an interesting resource for the identification of genetic markers and blood biomarkers associated with variations in immunity traits in pigs, as well as any other complex traits for which blood is an appropriate surrogate tissue.
- Research Article
15
- 10.1007/s11103-021-01138-8
- Mar 18, 2021
- Plant Molecular Biology
The genome-wide allele-specific expression in F1 hybrids from the cross of tropical and temperate lotus unveils how cis-regulatory divergences affect genes in key pathways related to ecotypic divergence. Genetic variation, particularly cis-regulatory variation, plays a crucial role in phenotypic variation and adaptive evolution in plants. Temperate and tropical lotus, the two ecotypes of Nelumbo nucifera, show distinction in the degree of rhizome enlargement, which is associated with winter dormancy. To understand the roles of genome-wide cis-regulatory divergences on adaptive evolution of temperate and tropical lotus (Nelumbo nucifera), here we performed allele-specific expression (ASE) analyses on the tissues including flowers, leaves and rhizome from F1 hybrids of tropical and temperate lotus. For all investigated tissues in F1s, about 36% of genes showed ASE and about 3% of genes showed strong consistent ASE. Most of ASEs were biased towards the tropical parent in all surveyed samples, indicating that the tropical genome might be dominant over the temperate genome in gene expression of tissues from their F1 hybrids. We found that promoter sequences with similar allelic expression are more conserved than genes with significant or conditional ASE, suggesting the cis-regulatory sequence divergence underlie the allelic expression bias. We further uncovered biased genes being related to phenotypic differentiation between two lotus ecotypes, especially metabolic and phytohormone-related pathways in the rhizome. Overall, our study provides a global landscape of cis-regulatory variations between two lotus ecotypes and highlights their roles in rhizome growth variation for the climatic adaptation.
- Research Article
57
- 10.1093/nar/gkw1076
- Nov 29, 2016
- Nucleic Acids Research
Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.