Abstract

An evolving vocabulary that explains the roles of proteins and genes is called gene ontology, or GO. Gene ontology (GO) describes the molecular, cellular, and biological levels of gene functioning. Semantic similarity gained relevance due to the widespread usage of gene annotations. There are a number of semantic similarity metrics that are available in the literature that concentrate on various strategies: distance-based techniques at the word level or gene product level, external documents, topology-based approaches that focus on boundaries, ancestor or child nodes. We presume that combining all of these elements results in a methodical way to gauge the degree of similarity across GO annotation items. We have conducted a detailed analysis of the biological pathways and GO keywords, and we have created a semantic measure of similarity called SimGOT. SimGOT takes into account topology-based similarity measures, membership of words in fuzzy clustering, and semantics hidden in the ontology or information content of a term. UniProt is used to build the datasets that are positive and negative. We compared four existing GO-based semantic similarity metrics based on semantic similarity, Pearsons correlation coefficient, and Protein Family (Pfam) subdomain group similarity. The superiority of SimGOT over alternative semantic similarity metrics is demonstrated by the experimental findings. KEYWORDS—Gene Ontology, Pearsons correlation coefficient, Protein family, SimGOT, UniProt

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call