Abstract

Unregistered biological words recognition is the process of identification of terms that is out of vocabulary. Although many approaches have been developed, the performance approaches are not satisfactory. As the identification process can be viewed as a Markov process, we put forward a Q-learning with transfer learning algorithm to detect unregistered biological words from texts. With the Q-learning, the recognizer can attain the optimal solution of identification during the interaction with the texts and contexts. During the processing, a transfer learning approach is utilized to fully take advantage of the knowledge gained in a source task to speed up learning in a different but related target task. A mapping, required by many transfer learning, which relates features from the source task to the target task, is carried on automatically under the reinforcement learning framework. We examined the performance of three approaches with GENIA corpus and JNLPBA04 data. The proposed approach improved performance in both experiments. The precision, recall rate, and F score results of our approach surpassed those of conventional unregistered word recognizer as well as those of Q-learning approach without transfer learning.

Highlights

  • From the perspective of computational linguistics, unregistered words are the ones that are out of vocabulary

  • Transfer learning, which aims at helping learning task in the new circumstance of knowledge learned from another circumstance, can transfer knowledge from existing data to aid future learning

  • Recognizing unregistered biological words from texts is essential to biological text mining

Read more

Summary

Introduction

From the perspective of computational linguistics, unregistered words are the ones that are out of vocabulary. They could be terms that are not documented in the vocabulary or newly generated ones. There are limited unregistered words recognition systems for dedicated domains, such as recognizer for biology terms. Machine learning algorithms require great amount of training data which would cost vast manual cost and material resources. What is more, training data and testing data are assumed to obey the identical data distribution in traditional machine learning which cannot be satisfied under many circumstances. Transfer learning will not obey the assumption of identical distribution as traditional machine learning

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call