Abstract

In recent years, a new approach of processing Chinese word segmentation (CWS) as a machine translation (MT) problem has emerged in CWS task research. However, directly applying the MT model to CWS task would introduce translation errors and result in poor word segmentation. In this paper, we propose a novel method named Translation Correcting to solve this problem. Based on the differences between CWS and MT, Translation Correcting eliminates translation errors by utilizing the information of a sentence that needs to be segmented during the translation process. Consequently, the performance of word segmentation is considerably improved. Additionally, We get a new model called CWSTransformer, which is obtained by improving the MT model Transformer using Translation Correcting. The experiment compares the performances of CWSTransformer, Transformer and the previous translation-based CWS model on the benchmark datasets, PKU and MSR. The experimental results show that CWSTransformer outperforms Transformer and the previous translation-based CWS model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.