Abstract

Biology and biomedicine are flourishing disciplines, with massive biological data produced in experiments and huge amount of research papers published in journals. In such a big data context, unsupervised data mining methods such as topic models are used to extract topics from large-scale document collections. In this paper, we present a biological literature mining system based on topic modelling BioTopic. Experiments show that the perplexity reduction percentage of our pre-processing method is 5% larger that of a traditional pre-processing method. The precision of our search performance reaches 86%, which is better that that of a unigram language model. Our method employs linguistic information from shallow parsing to better pre-process biological literature for topic models. BioTopic with fine-grained pre-processing and topic modelling works better than traditional literature mining systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call