Abstract

BackgroundWith the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics.DescriptionThis paper starts with the description of a topic model, with a focus on the understanding of topic modeling. A general outline is provided on how to build an application in a topic model and how to develop a topic model. Meanwhile, the literature on application of topic models to biological data was searched and analyzed in depth. According to the types of models and the analogy between the concept of document-topic-word and a biological object (as well as the tasks of a topic model), we categorized the related studies and provided an outlook on the use of topic models for the development of bioinformatics applications.ConclusionTopic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers’ ability to interpret biological information. Nevertheless, due to the lack of topic models optimized for specific biological data, the studies on topic modeling in biological data still have a long and challenging road ahead. We believe that topic models are a promising method for various applications in bioinformatics research.

Highlights

  • With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed

  • By exploring the relevant studies, we found that the tasks of a topic model for biological data are mainly focused on three concepts: biological data clustering analysis, biological data classification, and biological data feature extraction

  • Unlike in traditional clustering, a topic model allows data to come from a mixture of clusters rather than from a single cluster

Read more

Summary

Introduction

With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, socalled topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Conclusion: Topic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers’ ability to interpret biological information. A topic model is a kind of a probabilistic generative model that has been used widely in the field of computer science with a specific focus on text mining and information retrieval in recent years. Since this model was first proposed, it has received a lot of attention and gained widespread interest among researchers in many research fields. Based on LSI, probabilistic latent semantic analysis (PLSA) (Hofmann 2001) was proposed by Hofmann and

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call