Abstract

BackgroundA generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. Different definitions of biclusters will offer different opportunities to discover information from datasets, making it pertinent to tailor the desired patterns to the intended application. This paper introduces ‘GABi’, a customizable framework for subspace pattern mining suited to large heterogeneous datasets. Most existing biclustering algorithms discover biclusters of only a few distinct structures. However, by enabling definition of arbitrary bicluster models, the GABi framework enables the application of biclustering to tasks for which no existing algorithm could be used.ResultsFirst, a series of artificial datasets were constructed to represent three clearly distinct scenarios for applying biclustering. With a bicluster model created for each distinct scenario, GABi is shown to recover the correct solutions more effectively than a panel of alternative approaches, where the bicluster model may not reflect the structure of the desired solution. Secondly, the GABi framework is used to integrate clinical outcome data with an ovarian cancer DNA methylation dataset, leading to the discovery that widespread dysregulation of DNA methylation associates with poor patient prognosis, a result that has not previously been reported. This illustrates a further benefit of the flexible bicluster definition of GABi, which is that it enables incorporation of multiple sources of data, with each data source treated in a specific manner, leading to a means of intelligent integrated subspace pattern mining across multiple datasets.ConclusionsThe GABi framework enables discovery of biologically relevant patterns of any specified structure from large collections of genomic data. An R implementation of the GABi framework is available through CRAN (http://cran.r-project.org/web/packages/GABi/index.html).Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-014-0355-5) contains supplementary material, which is available to authorized users.

Highlights

  • A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix

  • The strategy taken here to demonstrate the success of the GABi framework in achieving its goal is to apply GABi with a number of different bicluster models to datasets that reflect the scenarios for which they were designed

  • The framework is clearly successful if the bicluster model matched to the appropriate task yields significantly better results than using a bicluster model designed to discover a different type of pattern

Read more

Summary

Introduction

A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix This approach is well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. The ability to discover relationships that may not be evident across the full set of samples in a dataset makes biclustering methods well-suited to the analysis of large heterogeneous biological datasets This is relevant in cancer research, which typically involve high levels of molecular and genetic heterogeneity, as is demonstrated in a recent application of biclustering [7]. It would help greatly to be able to discover any desired pattern across subsets of large data collections

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.