A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.

Aarón Ayllón-Benítez,Fleur Mougin,Rodolphe Thiébaut,Julien Allali,Patricia Thébault

doi:10.1371/journal.pone.0208037

Abstract

MotivationThe recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols has strongly impacted the interpretation of relations between phenotype and genotype. Thus, understanding the resulting gene sets has become a bottleneck that needs to be addressed. Automatic methods have been proposed to facilitate the interpretation of gene sets. While statistical functional enrichment analyses are currently well known, they tend to focus on well-known genes and to ignore new information from less-studied genes. To address such issues, applying semantic similarity measures is logical if the knowledge source used to annotate the gene sets is hierarchically structured. In this work, we propose a new method for analyzing the impact of different semantic similarity measures on gene set annotations.ResultsWe evaluated the impact of each measure by taking into consideration the two following features that correspond to relevant criteria for a “good” synthetic gene set annotation: (i) the number of annotation terms has to be drastically reduced and the representative terms must be retained while annotating the gene set, and (ii) the number of genes described by the selected terms should be as large as possible. Thus, we analyzed nine semantic similarity measures to identify the best possible compromise between both features while maintaining a sufficient level of details. Using Gene Ontology to annotate the gene sets, we obtained better results with node-based measures that use the terms’ characteristics than with measures based on edges that link the terms. The annotation of the gene sets achieved with the node-based measures did not exhibit major differences regardless of the characteristics of terms used.

Highlights

The recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols [1] has strongly impacted our understanding of the relations between phenotype and genotype
We evaluated the impact of each measure by taking into consideration the two following features that correspond to relevant criteria for a “good” synthetic gene set annotation: (i) the number of annotation terms has to be drastically reduced and the representative terms must be retained while annotating the gene set, and (ii) the number of genes described by the selected terms should be as large as possible
Using Gene Ontology to annotate the gene sets, we obtained better results with node-based measures that use the terms’ characteristics than with measures based on edges that link the terms

Summary

Introduction

The recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols [1] has strongly impacted our understanding of the relations between phenotype and genotype. Several genes in a given pathway that are slightly differentially expressed are more relevant than only one gene with a strongly different expression pattern This new field of research has become unavoidable over the last two decades and is based on the inference of gene sets according to a comparison of experimental results under diverse conditions. An additional issue involves the interpretation of these gene sets using the information available for each gene, which is based on annotation terms derived from a wide range of sources. These terms provide functional information for each gene with varying levels of details according to the current state of knowledge in the research area. The computing process of these terms has become crucial, as reported in burgeoning scientific studies using and developing methods dedicated to gene set annotations over the last two decades [4]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Nov 27, 2018
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

GSAn: an alternative to enrichment analysis for annotating gene sets.
Aaron Ayllon-Benitez ... Fleur Mougin
NAR Genomics and Bioinformatics | VOL. 2
Aaron Ayllon-Benitez, et. al.Aaron Ayllon-Benitez ... Fleur Mougin
14 Mar 2020
NAR Genomics and Bioinformatics | VOL. 2

Towards the assessment of semantic similarity analysis of protein data
Pietro Hiram Guzzi ... Marco Mina
ACM SIGBioinformatics Record | VOL. 2
Pietro Hiram Guzzi, et. al.Pietro Hiram Guzzi ... Marco Mina
01 Sep 2012
ACM SIGBioinformatics Record | VOL. 2

Optimizing gene set annotations combining GO structure and gene expression data
Dong Wang ... Yadong Wang
BMC Systems Biology | VOL. 12
Dong Wang, et. al.Dong Wang ... Yadong Wang
01 Dec 2018
BMC Systems Biology | VOL. 12

IntelliGO: a new vector-based semantic similarity measure including annotation origin.
Sidahmed Benabderrahmane ... Amedeo Napoli
BMC Bioinformatics | VOL. 11
Sidahmed Benabderrahmane, et. al.Sidahmed Benabderrahmane ... Amedeo Napoli
01 Dec 2010
BMC Bioinformatics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one