Abstract
BackgroundIdentifying cancer biomarkers from transcriptomics data is of importance to cancer research. However, transcriptomics data are often complex and heterogeneous, which complicates the identification of cancer biomarkers in practice. Currently, the heterogeneity still remains a challenge for detecting subtle but consistent changes of gene expression in cancer cells.ResultsIn this paper, we propose to adaptively capture the heterogeneity of expression across samples in a gene regulation space instead of in a gene expression space. Specifically, we transform gene expression profiles into gene regulation profiles and mathematically formulate gene regulation probabilities (GRPs)-based statistics for characterizing differential expression of genes between tumor and normal tissues. Finally, an unbiased estimator (aGRP) of GRPs is devised that can interrogate and adaptively capture the heterogeneity of gene expression. We also derived an asymptotical significance analysis procedure for the new statistic. Since no parameter needs to be preset, aGRP is easy and friendly to use for researchers without computer programming background. We evaluated the proposed method on both simulated data and real-world data and compared with previous methods. Experimental results demonstrated the superior performance of the proposed method in exploring the heterogeneity of expression for capturing subtle but consistent alterations of gene expression in cancer.ConclusionsExpression heterogeneity largely influences the performance of cancer biomarker identification from transcriptomics data. Models are needed that efficiently deal with the expression heterogeneity. The proposed method can be a standalone tool due to its capacity of adaptively capturing the sample heterogeneity and the simplicity in use.Software availabilityThe source code of aGRP can be downloaded from https://github.com/hqwang126/aGRP.
Highlights
Identifying cancer biomarkers from transcriptomics data is of importance to cancer research
Software availability: The source code of adaptive gene regulation probabilities (GRPs) model (aGRP) can be downloaded from https://github.com/hqwang126/aGRP
Based on an unbiased estimator of the likelihoods of the regulation events, we developed a new differential expression statistic, which can adaptively capture the heterogeneity of expression and makes it possible to flexibly detect cancer biomarkers with subtle but consistent changes
Summary
Identifying cancer biomarkers from transcriptomics data is of importance to cancer research. These methods are categorized into two groups: parametric or non-parametric The former often use a variant of t-statistic, e.g. SAM [14] and Limma [8], or negative binomial distribution, e.g., cuffdiff and DESeq, to model the differential expression of a gene. These methods made distribution assumptions that are often violated due to the complexity and heterogeneity of data in practice, and when applied to real data, they tend to produce similar overall results. Most of existing methods seldom consider or ignore the heterogeneity inherent in transcriptomic data and miss subtle but consistent expression changes [17, 18]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.