Abstract

BackgroundGenetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity.ResultsWe propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure.ConclusionSemantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.

Highlights

  • With the continued advance of high-throughput technologies, genetic and genomic data analyses are outputting large sets of genes

  • Definition of semantic particularity The semantic particularity of a set compared to another is the value that reflects the importance of the features that belong to the first set but not the second

  • Measure of semantic particularity In order to compute the particularity of Sg1 compared to Sg2, we focus on the terms of Sg1 that are not members of Sg2

Read more

Summary

Introduction

With the continued advance of high-throughput technologies, genetic and genomic data analyses are outputting large sets of genes. The amount of data involved requires automated comparison methods [1] The characterization of these sets typically consists in a combination of the following three operations [2,3]: first, synthesize the over- and under-represented functions of these genes [4,5]; second, identify how these genes interact with each other [6]; third, identify and quantify the common shared features and the differentiating features [7,8]. GSEA is useful for clustering a set of genes into subsets sharing over-represented features Among these features, the biological processes (BP), molecular functions (MF) and cellular components (CC) annotating each gene are represented using the Gene Ontology (GO) [17]. Genetic and genomic data analyses are outputting large sets of genes Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call