Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data

Tsuneo Kato ,Taiki Watanabe ,Chunpeng Ma ,Tomoya Iwakura ,Tomoya Ichikawa ,Akihiro Tamura

doi:10.48448/mxb3-4r27

Abstract

Named entity recognition (NER) is one of the core technologies for knowledge acquisition from text and has been used for knowledge extraction of chemicals and medicine. As one of the NER improvement approaches, multitask learning that learns a model from multiple training data has been used. Among multitask learning, an auxiliary learning method, which uses training data of an auxiliary task for improving its target task, has shown higher NER performance than conventional multitask learning for improving all the tasks simultaneously. The conventional auxiliary learning method uses only one auxiliary training dataset. We propose Multiple Utilization of NER Corpora Helpful for Auxiliary BLESsing (MUNCHABLES). MUNCHABLES utilizes multiple training datasets as auxiliary training data by the following methods : the first one is to fine-tune the NER model of the target task by sequentially performing auxiliary learning for each auxiliary training dataset, and the other is to use all training datasets in one auxiliary learning. We evaluate MUNCHABLES on eight chemical/biomedical/scientific domain NER tasks, where seven training datasets are used as auxiliary training data. The experiment results show that our proposed methods achieve higher NER performance than conventional multi-task learning methods on average and that NER performance can be improved by using multiple auxiliary training data. Furthermore, the proposed models outperform stateof-the-art models on the datasets

Full Text