Abstract
BackgroundIt becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity. Of vital importance for refined clinical practice and improved intervention strategies is to define the hidden molecular distinct diseases using modern large-scale genomic approaches. Microarray omics technology has provided a powerful way to dissect hidden genetic heterogeneity of complex diseases. The aim of this study was thus to develop a bioinformatics approach to seek the transcriptional features leading to the hidden subtyping of a complex clinical phenotype. The basic strategy of the proposed method was to iteratively partition in two ways sample and feature space with super-paramagnetic clustering technique and to seek for hard and robust gene clusters that lead to a natural partition of disease samples and that have the highest functionally conceptual consensus evaluated with Gene Ontology.ResultsWe applied the proposed method to two publicly available microarray datasets of diffuse large B-cell lymphoma (DLBCL), a notoriously heterogeneous phenotype. A feature subset of 30 genes (38 probes) derived from analysis of the first dataset consisting of 4026 genes and 42 DLBCL samples identified three categories of patients with very different five-year overall survival rates (70.59%, 44.44% and 14.29% respectively; p = 0.0017). Analysis of the second dataset consisting of 7129 genes and 58 DLBCL samples revealed a feature subset of 13 genes (16 probes) that not only replicated the findings of the important DLBCL genes (e.g. JAW1 and BCL7A), but also identified three clinically similar subtypes (with 5-year overall survival rates of 63.13%, 34.92% and 15.38% respectively; p = 0.0009) to those identified in the first dataset. Finally, we built a multivariate Cox proportional-hazards prediction model for each feature subset and defined JAW1 as one of the most significant predictor (p = 0.005 and 0.014; hazard ratios = 0.02 and 0.03, respectively for two datasets) for both DLBCL cohorts under study.ConclusionOur results showed that the proposed algorithm is a promising computational strategy for peeling off the hidden genetic heterogeneity based on transcriptionally profiling disease samples, which may lead to an improved diagnosis and treatment of cancers.
Highlights
It becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity
We demonstrated the behaviours and properties of the proposed method by applying it to two publicly available microarray datasets of diffuse large B-cell lymphoma (DLBCL), a notoriously heterogeneous phenotype
Coupled two-way clustering We searched significant gene subsets using the well established coupled two-way clustering (CTWC) algorithm as implemented in a public sever [26], which used super-paramagnetic clustering (SPC) as the underlying clustering tool to break down the total dataset into subsets of genes and samples iteratively until significant partitions were revealed
Summary
It becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity. The patients with a similar prognosis frequently respond very differently to the same treatment This may occur because two apparently similar tumours are completely different diseases at the molecular level, often called genetic heterogeneity. It describes the biological complexity whereby apparently similar inheritable characters result from different genes or different genetic mechanisms The presence of such heterogeneity has a significant impact on both the efficiency of modern clinical practice and biomedical research of common human diseases. Diffuse large B-cell lymphoma (DLBCL) analyzed in this study is the most common type of lymphoma in adults and demonstrates very apparently clinical heterogeneity. It can be treated by chemotherapy in only approximately 40% of patients. Several recent studies used DNA microarrays to study DLBCL, suggesting that it is possible to identify subgroups of patients in terms of different survival courses via gene expression data [9,10], which are unlikely to be discovered by traditional clinical approaches
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.