Abstract

Increasing communication among Chinese-speaking regions using respectively traditional and simplified Chinese character systems has highlighted the subtle-yet-extensive differences between the two systems, which can lead to unexpected hindrance in converting characters from one to the other. This article proposes a new priority-based multi-data resources management model, with a new algorithm called Fused Conversion algorithm from Multi-Data resources (FCMD), to ensure more context-sensitive, human controllable, and thus more reliable conversions, by drawing on reverse maximum matching, n -gram-based statistical model and pattern-based learning and matching. After parameter training on the Tagged Chinese Gigaword corpus, its conversion precision reaches 91.5% in context-sensitive cases, the most difficult part in the conversion, with an overall precision rate at 99.8%, a significant improvement over the state-of-the-art models. The conversion platform based on the model has extra features such as data resource selection and n -grams self-learning ability, providing a more sophisticated tool good especially for high-end professional uses.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.