Abstract

The performance1 of natural language processing has been greatly improved through the pre-trained language models, which are trained with a large amount of corpus. But the performance of natural language processing can be reduced by the OOV (Out of Vocabulary) problem. Recent language representation models such as BERT use sub-word tokenization that splits word into pieces, in order to deal with the OOV problem. However, since OOV words are also divided into pieces of tokens and thus represented as the weighted sum of the unusual words, it can lead to misrepresentation of the OOV words. To relax the misrepresentation problem with OOV words, we propose a character-level pre-trained language model called CCTE (Context Char Transformer Encoder). Unlike BERT, CCTE takes the entire word as an input and the word is represented by considering morphological information and contextual information. Experiments in multiple datasets showed that in NER, POS tagging tasks, the proposed model which is smaller than the existing pre-trained models generally outperformed. Especially, when there are more OOVs, the proposed method showed superior performance with a large margin. In addition, cosine similarity comparisons of word pairs showed that the proposed method properly considers morphological and contextual information of words.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.