Optimal Threshold Determination for Interpreting Semantic Similarity and Particularity: Application to the Comparison of Gene Sets and Metabolic Pathways Using GO and ChEBI.

Charles Bettembourg,Christian Diot,Olivier Dameron

doi:10.1371/journal.pone.0133579

Abstract

BackgroundThe analysis of gene annotations referencing back to Gene Ontology plays an important role in the interpretation of high-throughput experiments results. This analysis typically involves semantic similarity and particularity measures that quantify the importance of the Gene Ontology annotations. However, there is currently no sound method supporting the interpretation of the similarity and particularity values in order to determine whether two genes are similar or whether one gene has some significant particular function. Interpretation is frequently based either on an implicit threshold, or an arbitrary one (typically 0.5). Here we investigate a method for determining thresholds supporting the interpretation of the results of a semantic comparison.ResultsWe propose a method for determining the optimal similarity threshold by minimizing the proportions of false-positive and false-negative similarity matches. We compared the distributions of the similarity values of pairs of similar genes and pairs of non-similar genes. These comparisons were performed separately for all three branches of the Gene Ontology. In all situations, we found overlap between the similar and the non-similar distributions, indicating that some similar genes had a similarity value lower than the similarity value of some non-similar genes. We then extend this method to the semantic particularity measure and to a similarity measure applied to the ChEBI ontology. Thresholds were evaluated over the whole HomoloGene database. For each group of homologous genes, we computed all the similarity and particularity values between pairs of genes. Finally, we focused on the PPAR multigene family to show that the similarity and particularity patterns obtained with our thresholds were better at discriminating orthologs and paralogs than those obtained using default thresholds.ConclusionWe developed a method for determining optimal semantic similarity and particularity thresholds. We applied this method on the GO and ChEBI ontologies. Qualitative analysis using the thresholds on the PPAR multigene family yielded biologically-relevant patterns.

Highlights

Need for thresholdsComparing several gene sets to identify and quantify the features they share and the features that differentiate them is central to the functional analysis of gene sets [1,2,3]
We extend this method to the semantic particularity measure and to a similarity measure applied to the Chemical Entities of Biological Interest ontology (ChEBI) ontology
We developed a method for determining optimal semantic similarity and particularity thresholds

Summary

Background

The analysis of gene annotations referencing back to Gene Ontology plays an important role in the interpretation of high-throughput experiments results. This analysis typically involves semantic similarity and particularity measures that quantify the importance of the Gene Ontology annotations. There is currently no sound method supporting the interpretation of the similarity and particularity values in order to determine whether two genes are similar or whether one gene has some significant particular function. Interpretation is frequently based either on an implicit threshold, or an arbitrary one (typically 0.5). We investigate a method for determining thresholds supporting the interpretation of the results of a semantic comparison

Results

Conclusion

Introduction

Â logPðt0Þ logPðt1Þ þ logPðt2Þ

Method

Evaluation

Results and Discussion

Limitations

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Jul 31, 2015
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimal Threshold Determination for Interpreting Semantic Similarity and Particularity: Application to the Comparison of Gene Sets and Metabolic Pathways Using GO and ChEBI.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

A Novel Measure for Semantic Similarity Computation of Gene Ontology Terms Using Weighted Aggregation of Information Contents
Amir Lakizadeh ... Saeed Jalili
Hepatitis Monthly | VOL. 19
Amir Lakizadeh, et. al.Amir Lakizadeh ... Saeed Jalili
31 Aug 2017
Hepatitis Monthly | VOL. 19

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures.
Ameera M Almasoud ... Abdulmalik S Al-Salman
BioMed Research International | VOL. 2019
Ameera M Almasoud, et. al.Ameera M Almasoud ... Abdulmalik S Al-Salman
27 Jan 2019
BioMed Research International | VOL. 2019

Gene-pair representation and incorporation of GO-based semantic similarity into classification of gene expression data
Torsten Schön ... Alexey Tsymbal
Intelligent Data Analysis | VOL. 16
Torsten Schön, et. al.Torsten Schön ... Alexey Tsymbal
08 Oct 2012
Intelligent Data Analysis | VOL. 16

Semantic Particularity Measure for Functional Characterization of Gene Sets Using Gene Ontology
Charles Bettembourg ... Olivier Dameron
PLoS ONE | VOL. 9
Charles Bettembourg, et. al.Charles Bettembourg ... Olivier Dameron
28 Jan 2014
PLoS ONE | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal Threshold Determination for Interpreting Semantic Similarity and Particularity: Application to the Comparison of Gene Sets and Metabolic Pathways Using GO and ChEBI.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one