Abstract
For precision medicine, there is a need to identify genes that accurately distinguish the physiological state or response to a particular therapy, but this can be challenging. Many methods of analyzing differential expression have been established and applied to this problem, such as t-test, edgeR, and DEseq2. A common feature of these methods is their focus on a linear relationship (differential expression) between gene expression and phenotype. However, they may overlook nonlinear relationships due to various factors, such as the degree of disease progression, sex, age, ethnicity, and environmental factors. Maximal information coefficient (MIC) was proposed to capture a wide range of associations of two variables in both linear and nonlinear relationships. However, with MIC it is difficult to highlight genes with nonlinear expression patterns as the genes giving the most strongly supported hits are linearly expressed, especially for noisy data. It is thus important to also efficiently identify nonlinearly expressed genes in order to unravel the molecular basis of disease and to reveal new therapeutic targets. We propose a novel nonlinearity measure called normalized differential correlation (NDC) to efficiently highlight nonlinearly expressed genes in transcriptome datasets. Validation using six real-world cancer datasets revealed that the NDC method could highlight nonlinearly expressed genes that could not be highlighted by t-test, MIC, edgeR, and DEseq2, although MIC could capture nonlinear correlations. The classification accuracy indicated that analysis of these genes could adequately distinguish cancer and paracarcinoma tissue samples. Furthermore, the results of biological interpretation of the identified genes suggested that some of them were involved in key functional pathways associated with cancer progression and metastasis. All of this evidence suggests that these nonlinearly expressed genes may play a central role in regulating cancer progression.
Highlights
Identifying and characterizing biomarkers that accurately reflect a physiological state or response to a particular drug or therapy is challenging
There are some genes that can be detected by Maximal information coefficient (MIC) but not by t-test, edgeR, and DEseq2; these can be defined as nonlinearly expressed genes
To measure the impact of the nonlinearly expressed genes selected by normalized differential correlation (NDC) on the classification, we examined the classification accuracy on the six datasets with a supported vector classifier (SVC)
Summary
Identifying and characterizing biomarkers that accurately reflect a physiological state (normal or diseased) or response to a particular drug or therapy is challenging. High-quality biomarkers are especially important for cancer detection and the development of safe, effective treatments. Such biomarkers are the ultimate goal of many next-generation sequencing studies (Xiao et al, 2017; American Association for the Advancement of Science, 2018). Many methods for analyzing differential expression in RNA sequencing (RNA-seq) data with the aim of finding genes that are differentially expressed across groups of samples have been established, such as t-test, limma (Gentleman et al, 2004), edger (Robinson et al, 2010), and DEseq (Love et al, 2014) These methods tend to consider only the linear relationship (differential expression) between gene expression and phenotype, but may overlook nonlinear relationships (as shown in Figure 1) because gene expression may differ between population groups or between individuals, and can vary as a patient’s status changes. The expression of this gene is very useful to discriminate between control subjects and patients affected by prostate cancer
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.