Abstract
BackgroundEpistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges.MethodsWe develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions.ResultsBy large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples.ConclusionsThe proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data.
Highlights
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions
We developed a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with recently developed next-generation mRNA sequencing (RNA-seq) data, which allows simultaneous capture of all space information hidden in the RNA-seq data and genetic variation data, but with substantially reduced dimensions
They were used to develop the models for generating RNA-seq data in simulation (Detailed description were referred to Method Section).10 pairs of genes were selected from five genes : IRAK3, ACSS3, SUV420H1, ETV7, and HPS4 with genotype data from 1000 Genome Project dataset
Summary
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. Interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Epistatic effect in gene expression, defined as the departure from additive effects in a linear model of eQTL analysis [1], plays an essential role in understanding the gene regulation and disease mechanisms [2,3,4]. EQTL epistasis analysis remains fundamentally unexplored due to large computational challenges and data availability [6]. Studying the effect of epistasis on the gene expression could provide a better understanding of the genetic architecture and gene regulation. The widely used statistical methods for identifying eQTL epistasis are designed for microarray expression data where an overall expression of the gene is taken as a quantitative trait and all methods for QTL epistasis analysis can be used for eQTL epistasis analysis [10, 12]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have