Abstract

BackgroundWhen it comes to the co-expressed gene module detection, its typical challenges consist of overlap between identified modules and local co-expression in a subset of biological samples. The nature of module detection is the use of unsupervised clustering approaches and algorithms. Those methods are advanced undoubtedly, but the selection of a certain clustering method for sample- and gene-clustering tasks is separate, in which the latter task is often more complicated.ResultsThis study presented an R-package, Overlapping CoExpressed gene Module (oCEM), armed with the decomposition methods to solve the challenges above. We also developed a novel auxiliary statistical approach to select the optimal number of principal components using a permutation procedure. We showed that oCEM outperformed state-of-the-art techniques in the ability to detect biologically relevant modules additionally.ConclusionsoCEM helped non-technical users easily perform complicated statistical analyses and then gain robust results. oCEM and its applications, along with example data, were freely provided at https://github.com/huynguyen250896/oCEM.

Highlights

  • When it comes to the co-expressed gene module detection, its typical challenges consist of overlap between identified modules and local co-expression in a subset of biological samples

  • Algorithm) and independent principal component analysis (IPCA). Overlapping CoExpressed gene Module (oCEM) did not include principal component analysis (PCA) because of the following reasons: (i) PCA assumes that gene expression follows a Gaussian distribution; many recent studies have demonstrated that microarray gene expression measurements follow a non-Gaussian distribution instead [24,25,26,27], (ii) The idea behind PCA is to decompose a big matrix into the product of several sub-matrices and retain the first few components which have the maximum amount of variance

  • Due to the small number of genes, weighted gene co-expression network analysis (WGCNA) failed to identify any coexpressions across the 1904 breast cancer patients, while improved WGCNA (iWGCNA) and oCEM indicated two and three modules, respectively

Read more

Summary

Results

Human breast cancer In our previous study [37], the breast cancer data were used to detect 31 validated breast-cancer-associated genes, and we clustered those genes to functional modules using iWGCNA. We functionally enriched the two and realized that they possessed an overlapping set of genes significantly associated with regulation of gene expression and development processes and biological pathways related to cancer in general and breast cancer in particular (Additional file 2:TableS2), suggesting that oCEM was most likely to identify biologically relevant modules that were not represented by WGCNA or iWGCNA modules. Based on the three benchmark datasets, including human breast cancer, mouse metabolic syndrome, and E.coli, we can realize that most modules indicated by optimizeCOM are highly similar to those displayed by WGCNA and iWGCNA, whereas the rest of the modules are new ones significantly associated with clinical features as well as biological processes and pathways. Another way is choosing some co-expressed modules associated significantly with clinical features of their interest

Background
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call