Abstract

Statistical packages such as edgeR and DESeq are intended to detect genes that are relevant to phenotypic traits and diseases. A few studies have also modeled the relationships between gene expressions and traits. In the presence of multicollinearity and outliers, which are unavoidable in genetic data, the robust ridge regression estimator can be applied with the trait value as the response variable and the gene expressions as explanatory variables. In some simulation scenarios, the robust ridge estimator is resistant to outliers and less susceptible to multicollinearity than the ordinary least-squares (OLS) estimator. This study investigated the reliability of the robust ridge estimator, in a scenario where the explanatory variables have tail-dependence and negative binomial distributions, by comparing its performance to that of OLS using vine copula to model the tail-dependence among gene expressions. The robust ridge estimator and OLS were both applied to an ecological dataset. First, statistical analysis was used to compare RNA sequencing data between two treatments; then, 15 differentially expressed genes were selected. Next, the regression parameter estimates of robust ridge and OLS for the effects of the 15 contigs (explanatory variables) on trait values (response variables) were compared. Robust ridge regression was found to detect fewer positive and negative slopes than OLS regression. These results indicate that robust ridge regression can be successfully applied for RNA sequencing analysis to estimate the effect of trait-associated genes using real data, and holds great promise as a tool for modeling the association between RNA expression and phenotypic traits.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call