Abstract
The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group. In this work, we suggest the information-theoretic concept of mutual information to investigate the relationship between groups of genes, as given by data-driven clustering, and their respective functional categories. Drawing upon related approaches (Gibbons and Roth, Genome Research 12:1574-1581, 2002), we seek to quantify to what extent individual attributes are sufficient to characterize a given group or cluster of genes. We show that the mutual information provides a systematic framework to assess the relationship between groups or clusters of genes and their functional annotations in a quantitative way. Within this framework, the mutual information allows us to address and incorporate several important issues, such as the interdependence of functional annotations and combinatorial combinations of attributes. It thus supplements and extends the conventional search for overrepresented attributes within a group or cluster of genes. In particular taking combinations of attributes into account, the mutual information opens the way to uncover specific functional descriptions of a group of genes or clustering result. All datasets and functional annotations used in this study are publicly available. All scripts used in the analysis are provided as additional files.
Highlights
The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics
The mutual information Following Gibbons and Roth [13], the mutual information I(C, A) provides a figure of merit between cluster membership C and known gene attributes A, prior to such a step, it is necessary to obtain a better understanding about the specific relationship between the data generated clustering and the information contained in the functional annotation of genes
To what extent does a grouping of genes reflect their functional annotation, as e.g. given in terms of the structured vocabulary provided by the gene ontology (GO) consortium? In this work, we investigate the relationship between groupings of genes and their respective functional categories
Summary
The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. One of the common assertions in expression analysis is that genes sharing a similar pattern of expression are more likely to be involved in the same regulatory processes [1] This proposition, commonly referred to as 'guilt-by-association', has been exploited by a large number of clustering algorithms, grouping genes into a (small) number of classes, based on the similarity of their expression profiles. The paramount task is to enhance the biological interpretation of the data, e.g. by identifying physiologically relevant categories, based on existing bioontologies, associated with a particular grouping of genes. (redundancy) and the failure of individual attributes to adequately describe a given clustering or grouping of genes To overcome these problems, a heuristic strategy is devised that allows to detect combinatorial combinations of attributes, providing a more specific functional description of clustering results. The results are summarized and discussed in the last section
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.