Abstract

To cope with the challenges posed by the complex linguistic structure and lexical polysemy in ancient texts, this study proposes a two-stage translation model. First, we combine GujiBERT, GCN, and LSTM to categorize ancient texts into historical and non-historical categories. This categorization lays the foundation for the subsequent translation task. To improve the efficiency of word vector generation and reduce the limitations of the traditional Word2Vec model, we integrated the entropy weight method in the hopping lattice training process and spliced the word vectors with GujiBERT. This improved method improves the efficiency of word vector generation and enhances the model’s ability to accurately represent lexical polysemy and grammatical structure in ancient documents through dependency weighting. In training the translation model, we used a different dataset for each text category, significantly improving the translation accuracy. Experimental results show that our categorization model improves the accuracy by 5% compared to GujiBERT. In contrast, the Entropy-SkipBERT improves the BLEU scores by 0.7 and 0.4 on historical and non-historical datasets. Ultimately, the proposed two-stage model improves the BLEU scores by 2.7 over the baseline model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.