Abstract

With the growing number of genomic data in public repositories, efficient search methodologies have become a basic need to reach the relevant genomic data. However, this need cannot be fulfilled with the current repositories because they offer a limited search option which is a lexical matching of textual descriptions or metadata of the experiments. This technique is insufficient to get the required information needed to detect similarities between experiments within a large data collection. Due to the limitation of the existing repositories, in this study, we develop a text-based experiment retrieval framework by using both lexical and semantic similarity approaches to find similarities between experiments, and their retrieval performance was compared. This study is the first attempt to use text-driven semantic analysis approaches for developing a retrieval framework for experiments. An empirical study was conducted on a large textual description of Arabidopsis microarray experiments from the Gene Expression Omnibus database. In the proposed model, Jaccard similarity was used as a lexical similarity approach; Latent Semantic Analysis, Probabilistic Latent Semantic Analysis and Latent Dirichlet allocation were used as semantic similarity approaches to detect similarities between the textual descriptions of the experiments. According to the experimental results, relevant experiments can be retrieved successfully by text-driven semantic similarity approaches compared with the lexical similarity approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call